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ABSTRACT 



The problem of estimating state variables and parameters of 
dynamic systems in the presence of random disturbances and measurement 
noise is examined for a wide range of dynamic systems, including linear 
and nonlinear systems in discrete- and continuous- time „ This is a funda- 
mental statistical problem arising in, but by no means limited to, the 
areas of adaptive and optimum control. Parameter estimation is treated 
as a special case of state variable estimation. 

The general solution of the linear problem is given. Approxi- 
mation techniques are developed for the nonlinear problem and the results 
of simulation studies demonstrating the application of these techniques 
are presented, 

A dynamic programming formulation of the estimation problem is 
developed, 

A discussion of the concepts of controllability and observability 
is given in which particular attention is paid to the problems peculiar 
to discrete-time systems. 

Background material on matrix identities, Gaussian random vectors, 
the generalized inverse of a matrix, and probability density functionals 
is included in a series of appendices. 
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INTRODUCTION 



The introduction of statistical considerations into a wide range 
of engineering problems is a logical consequence of the existence of un- 
certainties of one type or another which frequently make a purely deter- 
ministic approach to these problems unrealistic. The field of automatic 
control abounds in these uncertainties, such as those arising from ran- 
dom disturbances, measurement noise and changing environment. Often un- 
certainties about the process or plant being controlled and about future 
commands are also present. Hence, it is not surprising that statistical 
methods have contributed significantly to the development of modern con- 
trol theory. 

A fundamental problem in the areas of adaptive and optimum control 
is that of estimation of state variables and parameters of a dynamic sys- 
tem when random disturbances and measurement noise are present. If the 
estimates of these quantities are to be used for control purposes, then 
the estimation problem must be solved in real time. The problem then is 
to synthesize an estimator which will act on all available data and con- 
tinually produce up-to-date estimates of the quantities of interest. In 
related problems of ex post facto data analysis the severe real time re- 
quirement is relaxed and, at the expense of computation time, priority 
is given to processing the often limited amount of data available in an 
optimal manner. 

This report is devoted to the examination of this estimation prob- 
lem for a wide range of dynamic systems. Chapter I presents the mathe- 
matical model to be used in the continuous- time version of the problem 
and the generality of the model and its relation to other problems is 
discussed. Because the analogous discrete- time problem is conceptually 






simpler, we first examine it in detail in Chapter II before returning 
to the continuous-time problem in Chapter III. Chapter IV discusses 
the concepts of controllability and observability. Chapter V gives 
the results of some computational studies! while several concluding re- 
marks and suggestions for possible further research are contained in 
Chapter VI. Material of a complementary and supplementary nature is 
included in a series of appendices. 

The principal references for this report are the work of Kalman 
[29, 31 5 32, 3^] a ^d the paper by Bryson and Frazier [lo], An effort 
has been made to use a notation which is compatible with these references. 



.2= 



CHAPTER I 



BACKGROUND AND PERSPECTIVE 



1 . 1 Introduction 

The purpose of this chapter is to lay the groundwork for the 
analysis of the estimation problem and to discuss various aspects of 
the mathematical models considered in the sequel. After taking care 
of some notational preliminaries and reviewing briefly the state 
approach to dynamic systems, the continuous=time model is presented. 
This model is then related to control problems and to classical linear 
filtering theory. The author chose the continuous-time model for this 
preliminary discussion because it is easily related to the model 
appearing most frequently in modern control theory. 



1 .2 Notation 



For simplicity vector notation is used throughout this work. 
All vectors are column vectors. Vectors are designated by under- 
lined lower case letters and matrices by underlined upper case 
letters. For example, the vector x and the matrix A are to be in- 
terpreted as follows; 



x 




diu 4 
1 1 


a 12 9 


9 9 a ln 


a 21 


a 22 


a 2n 


0 

0 


9 

O 


e 

9 


o 

a 4 

ml 


9 

a m2 


9 

a 

mn 
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The transpose of x and A are denoted by x' and A' respectively. The 
inner product of two vectors x and ^ is written as a matrix product. 
That is. 

x’Z 

The Euclidian norm of a vector x is denoted by || x || . 

|| x || = \/x ! x = yx f + ••<> + x^ 




Also 





2 

x 

i 



If A is a positive definite matrix, the following special notation is 
used for the associated quadratic form 




x* Ax = 



£ 

ij 



a xx 
ij i 3 



9 

If x is a function of time, then x(t) is the derivative of x with 
respect to time. That is. 



dx 1 (t)/dt 
dx 2 (t)/dt 



x(t) = dx(t)/dt = 



dx n (t)/dt^ 



The scalar function V(x^ (t) , . . . ,x n (t); t) is written V(x(t),t). 
The gradient of V is a vector denoted by V . The partial derivative of 
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V with respect to t is written V^. That is, 



3V/ 3X 1 



V v = grad V = 



o 



aV/ 9 x 
' n 



V = 3V(x(t) f t)/at 
x> 



Vector valued functions are designated by underlined lower case 
letters . For example , 



f(x,u,t) = 



f -j (X.| poo®, X^ p 9 0 0 0 , U^ 9 t ) 



ijj (x.j p o o o 9 X n 9 Uj 9000, U r 9 t ) 



We use (3f(x Q )/sx) to designate the Jacobian matrix of partial 
derivatives [af^/axj j evaluated at x = Xqo 



The segment of the time-function u(t) on the closed interval 



t. - t - t . is designated by u r , . , . 
0 1 -froth] 



The convention, of describing a dynamic system by a set of first 
order ordinary differential (or difference) equations or, more compactly, 
by a vector differential (or difference) equation is already well estab- 
lished in the control literature „ Introductory accounts of "state-space 
techniques", the name usually given to a body of concepts associated 
with this characterization, are given in references [7,19,68]. However, 
since vector notation w ill be used throughout this report, it is appro- 
priate to review briefly some of the basic ideas and relations. 



Informally, we may say that the future behavior of a system 
depends upon its present state and any forcing functions or inputs which 
may be applied to the system in the future. We limit our discussion to 
systems in which the minimum number of quantities necessary to specify 
the state of the system at time t is finite. The members Xj (t) , „ , . >* n (t) 
of such a minimum finite set are called state variables. The state vari- 
ables are elements of the state vector x(t) which ranges over a set X, 
called the state space. Examples of appropriate sets of state variables 
are the amount of charge on each capacitor and the amount of current in 
each inductor in an electric circuit, the position and momentum of each 
mass of a mechanical system and the output of each integrator of an 
analogue computer representation of a system. 

The equations of state for a continuous-time system are usually 
written as a system of first order ordinary differential equations of the 
form 

o 

X. (t) = f. (x. (t),,,, ,x (t),u (t),...,u (t),t) i = 1 n 

i 1 n 1 r 

where u^ , . . . , u^ are the inputs of the system. The equivalent vector 
equation is 

x(t) - f(x(t),u(t),t) (1.1) 

The analogous equations for a discrete-time system are 

x i ( k+1 ) f^( x 1 (k),ooo, x^ (k) „Uj (k),.,„ , u^ (k),k) i 1 , o o . , n 

and 
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x(k+1) = f(x(k),u(k),k) 



( 1 . 2 ) 



While equations (1.1) and (1.2) encompass a very large class of systems, 
it is well to remember that they do not include distributed parameter 
systems described by partial differential equations, systems with time 
delays described by differential -difference equations and other systems 
of more complicated types. 

As an example of the reduction of an nth order nonlinear differential 
equation to a system of n first order differential equations, consider 
the equation 

y^(t) + g(y 8 y,... ( y^ n " 1 \u,t) = 0 

o / Jt \ 

Let x 1 =y, Xg = y, „ . . , x n = y^ n “ . Then an equivalent system of 
first order equations is 
© 

X 1 (t) = Xg(t) 
x 2 (t) = x^(t) 



x n~1 (t) = *n (t) 

© 

x n (t) =-g(x 1 ^..^pU.t) 

Important special cases of (1.1) and (1.2) arise when the system 
is linear. For a linear system (1.1) becomes 

x(t) = F(t)x(t) + G(t)u(t) (1.3) 

One can easily verify that the solution of (1 .3) has the following form; 

x(t) = I(t 0 t o )x(t o ) + / $(t,s)G(s)u(s)ds 

t« 



= 7 = 



where §(t,t 0 ) is the transition or fundamental matrix of the system (1 .3) 
The transition matrix is the unique matrix satisfying the following rela- 
tions ; 



3t = 

KVV - i 

where I is the identity matrix. For the special case of time -invariant 
linear systems , F is a constant matrix and the transition matrix is given 
by 

oo 

|(t,t 0 ) = exp [ (t-t 0 )F ] = 2 (t “ t o) n £ n 

n=o n! 

As an example of the reduction of a linear differential equation 
to a system of first order equations of the form of (1.3), consider the 
third order time=invariant equation 

y (t) + a 2 y(t) + a 1 y(t) + a Q y(t) = b 2 u(t) + b.ju(t) + b Q u(t) 

Let x^ (t) = y(t)„ Then an equivalent set of first order equations written 
in vector form is 





x, (t) 




" ~ a 2 1 0 " 




x, (t) " 




" b 2 ’ 


_d 

dt 


x 2 (t) 


= 


-a 1 0 1 




X2<t) 


+ 


b 1 


Xj(t) 




1 

0 

p 

0 
o 
o 

1 




x^(t) 




b o . 



The linear version of the discrete-time system (1.2) is 

x(k+1) = F(k)x(k) + G(k)u(k) (1 .4) 
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The solution of the discrete-time linear system takes the following form; 



n-1 

x(n) = $(n,k)x(k) + |(n, j+1 )G( j)u(j) n> k 

j=k 



where 



$(n„k) = F(n-1 ) £(n-1 ,k) = F(n-1 ) " 0 F(k) , n>k 
f (k,k) = I 

Procedures for reducing difference equations to a set of first order 
difference equations are analogous with the ones illustrated for differ- 
ential equations. For a more detailed discussion of linear systems from 
the state point of view see [33 ]• 

1 .4 Mathematical Model 

1 .41 Statement of the Continuous -Time Problem 

We begin our discussion with a mathematical statement of the 
continuous-time estimation problem. The motivation for the mathematical 
model used will become apparent in the discussion immediately following. 
The reader may wish to read over this section lightly at first, returning 
after the relation of the model to various problems is developed. 

The problem to be considered is the estimation of the state 
variables of a nonlinear dynamic system which is excited by white 
Gaussian noise. It is assumed that a nonlinear combination of the state 
variables corrupted by additive white Gaussian noise may be observed 
over a finite time interval. 

In particular, we consider processes which may be described by a 



- 9 - 



system of nonlinear differential equations which can be written in 
vector notation as 



o 



X = f(x(t),t) + G(t)w(t) 



( 1 . 5 ) 



The observed signal is 



z(t) = h(x(t) „t) + v(t) 



( 1 . 6 ) 



where ; 

x is an n- vector, the state vector of the system 
w is an m- vector, m - n, the random input 
£ is a p- vector, the observation 
G is an n x m matrix 
v is a p- vector, the measurement noise 

The functions w and v are white Gaussian noise processes with zero 
means and the following covariance matrices; 



where E is the expectation operator, S is the Dirac delta function, 
and £(t) and R(t) are positive definite symmetric matrices which are 
continuously differentiable in t. It will be necessary to assume that 
f and h satisfy certain differentiability conditions with respect to 
their arguments. 

This model may be represented by the block diagram of Fig 0 1 » 

The wide arrows are used to indicate the flow of vector quantities. 



E[w(t)w*(T)] = 2 (t) £(t-7) 

E [ v(t)v> (T)J = R(t) £ (t^t) 

E[w(t)v'(T)J = 0 




for all t and T 



- 10 - 



The output of an oval block is obtained by the indicated nonlinear 
operation on its input. 



measurement noise v(t) 




Fig. 1 . Model of the Continuous-Time System 

Observations begin at an initial time t Q . The problem of real 

time estimation of state variables is to use the history of observations 

from t up to the present time t (i„e. , z , . -,) to continually produce 

° ~l c o» t J 

up-to-date estimates of the present state x(t). 

It will be convenient to imbed this problem in the more general 
problem of estimating the entire segment x r ,. This point will 

be discussed in subsequent chapters. 

1 .42 Relation to Optimum Control Problems 

The so-called optimum control problem is usually formulated 
in the following manner. The physical process to be controlled is 
represented by a system of ordinary differential equations of the form 

dx^ ( t ) / dt ~ f ^ ...o .Xj^.u^ .... ,u^ $ t ) ,i - 1 .... ,n 

where x 1 ,..„„x n are the state variables of the process, u^,...,u r 



- 11 - 



are the control variables and t is time. These equations are usually- 
written in vector notation as 

x(t) = f(x(t),u(t),t) 

The control problem is that of choosing u(t) from a set of 
admissible controls so that the state of the system follows an optimal 
trajectory. Optimality is usually defined as the minimization of a 
scalar performance index which may in general depend on u, x and t„ 

For example, if it were desired to move the system from an initial state 
^ to a specified final state Xj , the performance index might include 
time of transit, energy expended, or a combination of these and other 
criteria. 

The practical considerations of model inaccuracies and external 
disturbances make it desirable to have a closed loop system, that is, 
one in which the control u(t) is specified as a function of the state 
x(t) rather than as a preprogrammed function of time. 

Despite its short history, the literature devoted to various 
aspects of this problem is voluminous. Various mathematical techniques 
including the calculus of variations [l 4, 18, 36 ] , the "Maximum Principle" 
of Pontryagin [52,55iAl] » Bellman's dynamic programming [3 »5ol6] » and 
steepest descent [9,35] have been used in connection with this class of 
problems. In almost all of this work it is necessary to assume that the 
state variables of the system are observable or exactly measurable or 
may be calculated instantaneously from other observable quantities. Be- 
cause of external disturbances, instrument noise and the intrinsic diffi- 
culty of physically performing the unbounded operation of differentiation, 
the assumption that the state variables of the system are known exactly 
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is almost never justified,, 

It is known that for linear systems with quadratic performance 
criteria it is possible to solve the estimation problem and the optimi- 
zation problem separately and still obtain the over-all optimum system 
(see [23] or [28]), No analogous result can be anticipated for nonlinear 
systems; that is, when estimates are used in place of the actual values 
in nonlinear systems we cannot assume that the over-all system will st. il 1 
be optimal. From the viewpoint of application, however, if estimates of 
state variables were available, one would have no alternative but to use 
them, since the problem of joint estimation and optimization for a non- 
linear system belongs to a class of extremely difficult unsolved statis- 
tical optimization problems. Indeed, either the estimation problem or 
the optimization problem alone is extremely formidable, 

A diagram of the over-all control system using the estimates of 
the state variables is shown in Fig, 2, The estimator operates on all 
observable quantities, in this case u and z, to produce an estimate of 

random 




Fig, 2 



Block diagram of over-all control system 



x which it feeds to the controller. The controller produces u which 
acts on the process. The value of u is also fed to the estimator. 

The mathematical model for this noisy system is 

x(t) = £(*(*) »li(t)»t) + G(t)w(t) 
z(t) = h(x(t)„t) + v(t) 

This is identical with the model of (1.5) and (1.6) except that we have 
the additional known quantity u appearing in the first equation. In our 
model this is accounted for by the explicit dependence of f(x(t),t) in 
(1.5) on the time parameter t. 

There is no loss of generality in assuming w(t) to be white noise. 

It has been emphasized in the early work of Bode and Shannon [8], and 

Zadeh and Ragazzini [ 67 ] that a Gaussian process with a rational spectrum 

may be considered equivalent to the output of a linear system excited by 

white Gaussian noise. If we denote the state variables- of this equivalent 

linear system by x^ and the state variables of the process to be con- 
(1 ) 

trolled by x \ we may form a composite state vector 
- = |x< 2 >. 

and obtain equations of the form of (1.5) and (1.6). 

A simple example may help to clarify this point. Consider a 
scalar process of the form 

x(t) = f(x(t)„u(t)) + r(t) 



z ( t ) = h(x(t),t) + v(t) 



where r is a Gaussian random process with the spectrum 



N 2 



^rr^ _ ~~2 7 

1 x co ^ 



Then we may consider r as the output of the linear system 



r = = a r + w 



where w is a white Gaussian noise process with the spectrum 




N 



2 



If we designate x by x^ and r by x^ } we obtain the following system 
of equations; 





*<’><« ' 




f(x^(t),u(t)) + x (2, (t) 




0 


_d 




= 




+ 




dt 


_x (2, (t) 




- ax (2 >(t) 




1 



z(t) = h(x^(t)„t) + v(t ) 

which are of the form of (1,5) and (1,6), 

1 ,43 Relation to Adaptive Control 

Although there is no universally accepted definition of adaptive 
control, most investigators would agree that a central problem in the 
theory of adaptive control is the identification problem [46], Changing 
environmental conditions are frequently incorporated in the mathematical 
model of the system as changes of certain parameters in that model. The 



<=1 5“ 



identification problem is concerned with automatically evaluating current 
values of the parameters. Because these parameters usually enter the 
model multiplicatively , the problem of estimating them is often equivalent 
to that of estimating state variables of a nonlinear dynamic system. 

Estimation of parameters will be treated as a special case of es- 
timation of state variables. The basic technique is to augment the 
state-space of the system to include the parameters themselves and addi- 
tional quantities to account for the nature of the statistical variations 
of the parameters. For example, an unknown constant parameter b would 

e 

be handled by letting x n+ -j equal b and adjoining the equation x n+ ^ (t) = 0 
to the basic system of equations describing the process. A randomly varying 
parameter is treated as the output of a linear system excited by white 
noise. In this case it is necessary to adjoin the states of this fictitious 
linear system to the states of the basic process. 

Let us consider a simple example to illustrate these ideas. A 
problem of considerable importance is the control of systems with lightly 
damped mechanical resonances [46] . Frequently there is a problem of tuning 
the compensating network as the resonant frequency of the system drifts due 
to changing environmental conditions, such as are commonly experienced by 
high performance aircraft. For a second order system a suitable model 
for this situation might be 

• # o 

y + b y + (c Q + c) y = u + w 1 

o 

c + a c = w£ 

z = y + v 

where the random parameter c is considered to be the output of a first 
order system excited by white noise. Letting 
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x 1 = y 



x 2 = y 



= c 



we may rewrite these equations in the form of (1.5) and. (1.6) as follows; 



_d 

dt 



‘ X 1 ■ 




X 2 


= 


-*3 . 





x„ 



»b x 2 - (c Q + x^) Xj + u 
• a Xj 





0 


0" 




+ 


1 


0 






0 


1 





w„ 



w_ 



z = x 1 + V 

Again the explicit dependence of f(x(t),t) in (1.5) on t accounts 
for any known inputs such as control forces or test signals. 



1 .44 Relation to Linear Filtering Theory 



The application of statistical methods to automatic control 
problems began with the classic work of Wiener [ 65 ] on linear filtering 
and prediction theory. The early exposition by Bode and Shannon [8] 
presented the principal results of the Wiener theory in a form more 
easily understood by engineers. At about the same time, an extension of 
the Wiener theory which was particularly significant from a control point 
of view was made by Zadeh and Ragazzini [ 67 ]. Both of these early papers 
[8, 67 ] emphasized the viewpoint that stationary random processes with 
rational spectra could be thought of as the output of linear systems 
excited by white noise. Several investigators [1,13,27] have extended 
these ideas to multidimensional systems. The work of Davis [ 13 ] gives 
a general solution to the problem of factoring spectral matrices and 
relates the linear stationary theory to the state-space approach. Today 
material on the one-dimensional Wiener theory and its extensions is avail- 
able in a large number of standard textbooks; for example, [11,12,39,^2, 
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49, 57 , 60, 63]. 

The problem we consider is a nonlinear version of the growing 
memory filters discussed by Kalman and Bucy [34] who studied random 
processes described by the linear vector differential equations 

o 



x(t) = F(t)x(t) + G(t)w(t) 


(1.7) 


z(t) = H(t)x(t) + v(t) 


(1.8) 



which are the linear form of (1.5) and (1.6). They give the general 
solution to the problem of linearly estimating and predicting x(t) 
based on a finite observation interval. If F, G, and H are constant 
matrices and the observation interval extends to negative infinity, this 
is a multivariate Wiener filtering problem in which the random process is 
specified by the linear system (1.7) and (1.8) rather than by its spectral 
matrix. For this stationary case, (1.7) and (1.8) may be obtained by 
factorization of the spectral matrix [13]. It is indeed unfortunate that 
the important results of Kalman and Bucy are not well understood by many 
control specialists. This is, no doubt, due in part to the elegant but 
intricate nature of their original derivation. An interesting by-product 
of our study of the nonlinear problem has been a simplified derivation 
of their results for the corresponding linear problem, (see Chapter III.) 

The idea of considering a random process to be the result of exciting 
a linear system with white noise has great intuitive appeal. This is es- 
pecially true for a stationary process where a suitable linear system may 
be postulated as the result of direct measurement of the spectrum of the 
process. For the time-varying system of Kalman and Bucy it is not at all 
clear how to make measurements on a non-stationary process to find a suit- 
able system. However, their work has direct application to control problems 
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in which prior information about a time-varying system is available. 

There is considerable appeal to the idea of considering a 
stationary non-Gauss ian random process to be the result of exciting 
with white noise a nonlinear time- invariant system having a finite number 
of state variables. Again it is not at all clear how to make measurements 
on the process to determine the appropriate system. Our work is applicable 
to cases in which prior information about the nonlinear system is available, 

1 ,45 Generality of the Model 

It has already been pointed out that any known inputs such as 
control forces or test signals are taken into account by the explicit 
dependence of f(x(t)„t) on t. We also showed how estimation of parameters 
could be treated as a special case of estimation of state variables. 

There is no loss of generality in assuming £(t) is positive 
definite because of the presence of G(t) in (1,5). The restriction that 
R(t) be positive definite is more basic. To relax this restriction for 
the continuous-time problem is to admit the possibility of differentia- 
ting the observed signal [29], 

A more general version of (1,5) would be 

x(t) = 0.9) 

The basic objection to (1,9) is that extreme care must be taken when per- 
forming nonlinear operations on white noise because white noise is defined 
by a limiting process and the appropriate limit may fail to exist. For 
example,, it is meaningless to square white noise. This objection may be 
avoided by considering the equation 

x(t) = f(x(t),t) + G(x(t),t)w(t) ( 1 . 10 ) 
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For simplicity, we shall, in the main, confine our attention to systems 
in which the white noise enters linearly as in (1 .5) , pointing out where 
extensions can be made to include systems of the type (1.10) and well de- 
fined versions of (1.9). 

1.5 An Intuitive Approach 

Before becoming involved in a maze of equations it is perhaps 
worthwhile to see in which direction an intuitive approach might lead. 

Consider the problem of estimating the state variables of a non- 
linear system for which all inputs are known (no random disturbances) 
and the observations at the output are noisy. A simple approach to this 
problem is to feed the known inputs to a model of the nonlinear system 
which hopefully, after some initial transients have subsided, will behave 
in the same manner as the system itself. The model would be constructed 
so that its state variables were easily measurable. Such a scheme is 




Fig. 3» Open loop model approach 



We shall say that the scheme of Fig. 3 uses the model in an open loop 
manner. While in theory this approach should yield good results, the 
practical considerations of model inaccuracies and random disturbances 
would render it useless in all but the simplest of cases. 
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The obvious way to combat model inaccuracies and random dis- 
turbances is by the use of feedback. This is done in Fig. 4. 




Fig. 4. Closed loop model approach 



The closed loop configuration of Fig. 4 has great intuitive appeal. 

While it is not clear how the error should be processed in order to 
properly adjust the model 9 we would expect that something could be 
worked out. It will be shown later that the model approach may be 
derived as the result of considering the basic equations (1 .5) and (1 .6) 
and that there is a rational way of processing the data. 

It is amazing how many seemingly diversified topics fall into 
the class of closed loop model schemes. Kalman and Bucy [34] show that 
the solution of the linear filtering problem may be interpretated in this 
manner. Davis [l 3 1 gives a model interpretation when the measurement 
noise is not white. The very general Wiener theory of nonlinear systems 
[46, 66] assumes that nothing is known about the system and, using white 
Gaussian noise as the known test input, develops a model in terms of an 
orthogonal series. 
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A variant on this theme is the optimization of the system by ad- 
justing it to behave more like the model, The recent work of Sakrison [56] 
and Kushner [ 38 ] in this area may be cited as examples, A large number 
of optimization and/or learning schemes using models are to be found in 
the adaptive control literature [ 43 , 46 ], 

The various schemes using models differ in the amount of prior 
information assumed and the way in which observations are processed to 
make improvements on the model or system. In our work it is assumed that 
the system is represented by known equations of the type (1,5) and (1,6), 

In the preceding sections it was shown that many problems fall into 
this category. There are many situations, however, in which adequate 
knowledge of the physics of the system is not available and one is unable 
to write equations of the form of (1,5) anjd (1 ,6) even using the technique 
of treating unknown parameters as additional state variables. Our work 
is not directly applicable to these situations. 
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CHAPTER II 



ESTIMATION OF STATE VARIABLES FOR DISCRETE-TIME SYSTEMS 
2.1 Introduction 

This chapter is devoted to the problem of estimating state 
variables for discrete- time systems # This problem is completely anal- 
ogous to the continuous-time problem discussed in the preceding chapter# 
The discussion of the usefulness and limitations of the continuous-time 
model presented there applies also to the discrete- time model to be 
introduced forthwith# 

Discrete- time systems are usually approximations of continuous-time 
systems# They assume great practical importance because digital computers 
work in discrete-time and complicated problems usually must be solved on 
digital computers# The extent to which the ability to use digital com- 
puters to obtain numerical solutions to complex problems has influenced 
modern technology is evidenced in the fact that today an effective com- 
putational algorithm is often considered to be a solution to a problem 
which, because of the limitations of purely analytical techniques, would 
not even have been investigated a decade ago# The possibility of using 
special purpose digital computers as control system components is to a 
large extent responsible for present trends in control theory. We shall 
continually be mindful of computational difficulties as the investigation 
of the estimation problem proceeds# 

The equations which characterize the remainder of this work seem 
quite formidable at first glance# The reader should take heart in that 
the underlying ideas are simple and most derivations involve nothing 
more complicated than differentiation# 
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2*2 Problem Statement 



The problem to be considered is the estimation of the state 
variables of a discrete- time nonlinear system which is excited by a 
sequence of independent Gaussian random vectors* It is assumed that 
nonlinear combinations of the state variables corrupted by additive in- 
dependent Gaussian noise may be observed* 

In particular, we consider systems which are described by vector 
difference equations of the following form; 



where 

x is an n= vector, the state vector of the system 
w is an m= vector, m - n, the random input 
z is a p- vector, the observation 
G is a n x m matrix 
v is a p- vector, the measurement noise 
h and f are vector-valued functions 
...,w(k),w(k +1 ),*** and v(k),v(k +1 
are sequences of Gaussian random vectors with zero means and the following 
covariance matrices 5 



x(k+1) = f(x(k),k) + G(k)w(k) 



(2*1) 



Hie observed signal is 



£(k) = h(x(k),k) + v(k) 



( 2 . 2 ) 



E [w(j)w«(k)] = £(k) c 

Efl(j) X'(k)] = R(k) 

E (w(j) I s (k)] =0 




for all intergers j and k 
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£ and R are positive definite matrices, E is the expectation operator 
and & is the Krone cker delta. Again there is no loss of generality 
by assuming that £ is positive definite. The restriction that R be 
positive definite will later be relaxed. 

Having observed a finite sequence f z(o z (n) ] , we may, in 
general, seek an estimate of an entire sequence of states { x(o),...,x(n), 
,,,x(n-hn)J , We shall call this the estimation problem. In this formula- 
tion we include as special cases the filtering problem, where an estimate 
of the current state x(n) is sought? the smoothing problem, where an es- 
timate of the sequence { x(o),.» 0 ,x(n)j is of interest; and the prediction 
problem, where an estimate of a future state x(n-tm) is desired. 

As in the con tinuqus- time problem, we may easily include as 
state variables unknown or randomly varying parameters. Any known 
forcing functions, such as control inputs or test signals are accounted 
for by the explicit dependence of f(x(k),k) on the time parameter k. 

Such known parameters will usually simplify the estimation problem since 
they provide additional information about the system behavior. 

The model for the discrete-time system may be represented by the 
block diagram of Fig. 5® 



measurement noise v(k) 



random 



input 

w(k) 





+ 



z(k) 

' observation 



> 




Fig* 5 o Block diagram of the discrete«time system 



Kalman [31] first solved the linear filtering and prediction 
problems for the linear version of (2,1 ) and (2.2) given by 



In recent independent work Ho [25] has used an approach similar 
to the one we shall use to re-solve this linear problem. 

The systems described by (2.1) are a subset of the systems defined 
by the more general equation 



Extensions to include these more general systems will be pointed out, but 
it is more convenient to work with (2.1) as the basic equation. 

Finally, we should note that the time parameter k designates 
the time of the kth observation. There is no need to assume that obser- 
vations are equally spaced. There is no loss of generality in assuming 
that observations begin at k=o. 

2.3 The Probability Distribution for the Sequence of States 

Let us first examine the problem of estimating the sequence of 
states [ x(o) ,... ,x(n)] , having observed the sequence j' z(o ),..., z (n)] . 
Proceeding in a straightforward manner, we consider the a posteriori 
probability density function 



which is the conditional joint probability density function for the 



x(k+1 ) = F(k)x(k) + G(k)w(k) 



z(k) = H(k)x(k) + v(k) 



x(k+1 ) = f(x(k),w(k),k) 



(2.3) 




,x(n) | z (o) , . . . ,z (n) ) 
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sequence of states [ x(o) , , , . »x(n)} on the assumption that the observation 
sequence [ :s(o)„„,, ,z(n)j has occurred. This function will be, in general, 
extremely complex. Using Bayes' Theorem, we obtain 




,x(n)|z(o) 



z(n)) 





0 o 0 © o 



z(n)) 



(2.4) 



Let us consider the terms of interest which appear in the 
numerator of this expression. Using (2 0 2) and the independence of the 
measurement noise, we may rewrite the first term as 



ment noise v(k)„ 

The second term may be rewritten in the following form; 
P XN (x(°),ooojX( n )) = P 0 (x(o)) e Pi (x(l )|x(o)) ••• 

p n (x(n) I „ o ,x(o) ) 

The independence of successive values of w makes (2,1) a Markov pro- 
cess [26], That is, 

Pkte ( k )|£( k=1 )»••• »*(°)) = PkCsOOl^Ck - 1 )) 

Substituting into (2,4), we obtain 



n 



P z , x (z(o) 

z nI x n 



|> O O « f 



z(n)|x(o),..,,x(n)) = TT p (z(k)-h(x(k),k)) 

k=o k 



k=o 
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x(n)| z(o) 



p„ (x(o), 
X N |Z N “ 



,z(n)) 



71” P v (z(k)-h(x(k),k)) p Q (x(o)) 7T Pk(2( k )| x (k-1 )) 
k=o k k^1 

Pz N (2(o),... t z(n)) 



( 2 . 5 ) 



We assign an a priori Gaussian distribution with mean m and co- 

variance matrix P(o) for the unknown initial state x(o). it will be 

convenient, although not essential, to assume P(o) is non-singular. This 

restriction will later be relaxed. 

Before proceeding further, it is necessary to examine the nature 

of the input sequence. Let r(k) = G(k)w(k). Then ...,r(k),r(k+1 ),... 

is a sequence of independent Gaussian random vectors with zero means. 

The covariance ma,trix for r(k) is G(k)2(k)G* (k). If we designate the 

probability density function for r(k) by p and then use the fact that 

r k 

x(k) = f (x(k-1 ),k-1 ) + r(k-1 ) 
we obtain formally 



P k (*( k ) )) = P r ^ ^ (x(k)-f(x(k-l ),k-1 )) 



If GQG * is singular, then r is confined with probability one to a 

hyperplane in n-space of dimension equal to the rank of GQG 1 , and 

without resorting to delta functions we are unable to write an explicit 

expression for the probability density function p . We note, however, 

r k 

formally 

Px- k (£( k )) = /Pr k iw k ^ k ^2( k )) P^feOO) d 2£( k ) 



( 2 . 6 ) 
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where the conditional density function p , may be interpreted as a 

r k' w k 

constraint, since r(k) is known once w(k) is specified. 

If, on the other hand, the covariance matrix GQG 1 is non-singular, 

an explicit expression may be given for the Gaussian probability density 

function p . In this case all the terms in the numerator of (2,5) may 
r k 

be expressed explicitly since p , p , and p are all Gaussian with known 

r k v o 

means and covariance matrices. Then, (2,5) becomes 



P ■ (x(o),,..,x(n)j z(o),.,.z(n)) = C(z(o),,..,z(n)) • 

X N |Z N 

A p 

exp '- I ||z(k)-h(x(k),k) || , - £ II x(°)-E II i 

k=0 R (k) P (o) 



n 1 2 

i YZ ||x(k+1 )-f(x(k),k) || _1 

k=0 [GQG*] 



(2,7) 



where C(z (o) , , , , „£(n) ) is a normalizing factor depending only on the 
observed output sequence and known constants, T)ie norm notation has 
been used to emphasize the positiveness of the quadratic forms appearing 
in this expression. 



2A The Estimation Procedure 

Now that an expression for the a posteriori probability density 
function has been obtained for the case in which GQG * is non-singular, 
the nature of the difficulties introduced by nonlinearities is apparent. 
One might choose an estimate x so as to minimize the conditional expecta- 
tion of some appropriate loss function. That is 

Min I E [ L(x-x)] 
x [ x|z 
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For a quadratic loss function defined by a positive definite matrix 
W, it is well known [29] that the best estimate is the conditional mean. 

To see this let 



2 

L. (x-x) = ||x-x || = x’Wx - 2x’Wx + x'Wx 

1 W 



Then 

E [ L 1 (x-x) ] = x'Wx - 2x t Wx + x'Wx 

x|z 

Differentiating with respect to x and setting the result equal to zero 
we find that the best estimate is x, the conditional mean. 

The task of finding the mean of (2.7) is overwhelming because of 
the nonlinearity of the system. 

It is also well known [j&l that for a linear loss function of the 

form 

L £ (x-x) = E ^ l x i"\l c i> 0 



the ith coordinate of the best estimate is the median of the marginal 
distribution for x^. To see this note that 



E [Mx-4)] = EE c. |x.-x | 

xiz * x 1 



and 





(x ± -x. ± ) P i (x ± ) dx ± 



+ a J p i (x i> ^i 

*i 



Taking the derivative of this expression with respect 



to x^ and setting 



= 30 - 



the result equal to zero, we obtain the desired result. 



x i r 03 

I Pi^) ^ = A J Pi^) ^i 

— o o J 

Again the nonlinearity of the system is our nemisis if we try to apply 
this result to (2.7). 

An appealing loss function for many applications is one that 
weights equally all errors greater than some threshold value. This 
would express the idea that once the size of the error exceeded a spec- 
ified level, an increase in the size of the error would not be signifi- 
cant. An example of such a loss function is 



L3QC-X) = 




V. 



c 



2 

, for || x-x || - c 

W 

2 

, for ||x-x || > c 

W 



For this type of loss function we would expect that the mode 
would be a better estimate than either the mean or the median. To gain 
some insight into this situation, consider the loss function and the proba- 
bility density function for a scalar random variable 6 shown in Fig. 6. 





Fig. 6. A probability density function and a loss function 
It is clear that for the situation of Fig. 6 the expected loss 
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CD> 



(area of the product) will be minimized by locating the valley of 
L(0~0) to correspond to the peak of p(0). That is, the mode would be 
the suitable estimate. 

It is known (see [29] or [ 58] ) that if the distribution function 
„ (x) is unimodal and symmetric about its mean, then the conditional 
mean is the best estimate for a large class of symmetric loss functions 
including , Lg and above. We hasten to point out that in this case 
the mean and the mode coincide and one might just as well say that the 
mode were the best estimate. 

The estimate we shall use is the mode of the a posteriori dis- 
tribution. This is closely related to the method of maximum likelihood; 
the difference being that we take the Bayesian point of view of assigning 
an a priori distribution to the initial state. 

For GQG ' non-singular, the estimate will be obtained by minimizing 

2 n 2 

2J_ = || x(°)“E II i + £ II z(k)-h(x(k),k) || 1 

P” (o) k=o R -1 (k) 

n i 2 

+ £ II x( k+1 )-£(*( k )» k ) II -1 

k=o [G^}»] (2.8) 

with respect to the sequence ( x(o) ,... ,x(n)} . This is equivalent to 
minimizing with respect to the sequences j x(o),...,x(n)j and (w(o),..., 
w(n-1 )] , the expression 

2 n 2 n-1 2 

||x(o)-m || « + £ ||z(k)-h(x(k),k) || . + X] IliiCk) || 

P (o) k=o R“ (k) k=o (k) 

subject to the constraint 

x(k+1) = f (x(k),k) + G(k)w(k) 
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Introducing an n- vector of Lagrange multipliers \ to incorporate the 
constraint, we may minimize 

I n = i ||x(°)»m || + YZ 2 llz( k )-Mx( k )* k ) II 

n P (o) k=o R“'(k) 

+ f:{i«w(k) |1 1 + A* ( k ) t x(k+1 )-f(x(k),k)-G(k)w(k)]} (2.9) 

k=o £ (k) 

From (2.6) we see that the function I R may also be minimized when GQG 1 
is singular. 

The more general problem of estimating the state sequence 
|x(o),...,x(n),...,x(n-to) ] after the sequence (z(o),...,z(n)j has been 
observed may be reduced by an analogous development to that of minimizing 



I 



n,m 



i ||x(o)~m 



P" 1 (o) 



xi 2 

JZ i ||z(k)-h(x(k),k) 1 
£o R” 1 (k) 



+ 



n-hn~1 f 2 

YZ I i II w(k) || +X‘(k) [x(k+1 )“f (x(k),k)-G(k)w(k)] \ 

k=o (k) 

(2.10) 



A comparison of (2.9) and (2.10) yields 



■^n.m ^n 



n4m~1 ( 2 -) 

YZ | lllsOO II i + l'(k) [x( k +1)"f(^(k),k)-G(k)w(k)] 

k=n ^ (k) 



( 2 . 11 ) 



From (2.11) we conclude that I may be minimized by first mini- 

XI p XU 

mizing I n and then setting w(k) equal to zero for k=n,...,n+m-1 . This 
leads us to the important conclusion that predictions are simply extrapola- 
tions of present estimates. Specifically, if we let x(k|n) denote the 
estimate of x(k) given ( z(o),... ,z,(n)j , then 
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x(n+1 In) = f(x(nln),n) 
x(n+2|n) = f(x(n+1|n), n+1 ) 



x(n-hnln) = f (x(n+m-1 1 n) , n-Ha-1 ) (2.12) 

Several investigators (see [13] and [29]) have pointed out that extra- 
polation of state variables is an interpretation of what the Wiener 
predictor does in the linear case. 

For the more general system (2.3) » the basic process is still 
Markovian. In this case a modified version of (2.9) is obtained 



2 n 2 

I n = i||s(°)-s II 1 + Yj ill z(k)-h(x(k),k) || 

P (o) k=o R (k) 



n-1 , 2 

£ f i II H( k ) II 

k=o 1 



(k) 



A'(k) [x(k+1 )-f(x(k),w(k),k)]j (2.13) 



For this case (2.12) becomes 

x(n+1ln) = f (x(nln),0,n) 



(2.14) 



2.5 Dynamic Programming Formulation 

Having reduced the estimation problem to a minimization problem, 
the possibility arises of proceeding sequentially by a dynamic pro- 
gramming formulation [3, 4, 5, 20 ] and at each step obtaining an estimate 
of the present state. This can be done is either f”^ exists or GQG 1 is 
non-singular. 

Case i . f" exists for k=o, . . . „n-1 
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If f“^ exists we have from (2.1) 

x(k-1 ) = f”^ (x(k)-G(k-l )w(k-1 ) ,k-1 ) (2.1 5) 

In order to establish a sequential procedure , ,.we define the following 
scalar "cost" function; 



2 2 
V 0 (x(°)) = IU(o)-m || 1 + II z(o)-h(x(o),o) || 

r (o) r- 1 (o) 



V n (x(n)) = Min /||x(o)-m|| a 

w(o),...,w(n-1 ) ] r (o) 



2 ru.1 

I 

R-'OO 



+ II £(k)=h(x(k)»k) || 1 + Y\ ||w(k) || , f n=1,2„ 

Wo R (k) Wo 2 _1 (k)J 



(2.16) 



subject to the constraint (2.15). The separability property of V n 
enables us to rewrite this relation in the following form 



V n (x(n)) = Min 

w(n-1 ) 



Min 

w(o),„..,w(n-2) 



||x(o)-m || 

P (o) 



n»1 2 

+ ^ l|z(k)-h(x(k),k) 1 , + 

Wo R” 1 (k) 



Y\ O o 

E l 5 (k) I , 

k=o £ (k) 



2 2 

+ 1| z(n)^h(x(n),n) || « + ||w(n-1) || 1 

R” 1 (n) 2” 1 (n-1 ) 



n 1 p . . o 



subject to the constraint (2.1 5)» This is equivalent to 



V (x(n))= Min 

w(n~1 ) 




(x(n=1)) + 



|[z(n)-h(x(n),n) || . 

R'(n) 



+ II w(n~1 ) 
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again subject to (2.15). Finally, using (2.15), we obtain the functional 
equation 

V (x(n)) = Min J V 1 (f” 1 (x(n)-G(n-l )w(n-1 ),n-1 )) 
n w(n-1 ) | "- 1 

2 2 ] 

+ llw(n-1)|| 1 +|| z(n)-h(x(n),n) || > 

2 (n~1 ) R (n) J (2.17) 

Let c be any particular value of x(n). We may interpret V n (c) 
as a measure of the unlikeliness of the most probable sequence of states 
| 4 (o), . . . ,x(n)] in which x(n) takes on the particular value c, given the 
observed sequence ( z^(o),... „z,(n)} and the a priori distribution for x(o). 

The estimate of x(n) is that value of x(n) for which V n (x(n)) is minimum. 

This corresponds to choosing the x(n) of the most probable (least unlikely) 
of all possible sequences of states given the observation and the a priori 
distribution. 

This procedure may be extended to the more general system (2.3) 
if x(n-1 ) is uniquely specified by x(n) and w(n-l). For such systems 
there will exist a relation of the form 

x(n~1 ) = &(x(n),w(n»1 ),n-1 ) (2.18) 

Using (2.18) in place of (2.15) in the preceding development, the follow- 
ing functional equation is obtained; 

v n (x( n )) = Min J V n 1 (£(x(n),w(n»1 ),n-1 )) 
w(n~1 ) ( 

2 2 1 

+ ||w(n-1)|| +|| z(n)-h(x(n),n) || 1 > 

fiT (n-1) R (n) (2.19) 
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Case ii . G(k)2(k)G' (k) is non-singular for k=o,...,n. 

Let us first consider the problem of predicting one step ahead. 
Then the function to be minimized is 



2<J . ~ f] j ||z(k)-h(x(k),k) II 2 +|| £(k+l)-f(x(k),k) || 2 .1 I 

nJ k=o 1 R" (k) [G2G«] J 



+ || x(o)-m || « 

P (o) 



We define a "cost" function as follows; 



W 0 (x(°)) = II x(o)-m || 1 

£ (o) 



w ri+1 (i( n+1 )) = Min <J f|x(°)-2i || + + YL I'll !(k)-h(x(k),k) || 

n1 x(o),...,x(n) ] P~\o) k=o L R"\k) 



+ || x(k+1)-f(x(k),k) || .1 

im.'] 



n=1 ,2,, 



( 2 . 20 ) 



Using the separability property as before, this relation may be re. 
written in the following form; 



W n + 1 (x(n+1 ) ) = Min J Min 

x(n) | x(o ) , o » o ,x(n-1 ) 



2 n-1 r 2 

||x(o)-m||^ 1 + II 2 (k)-h(x(k),k)|| 1 

P (o) k=o L R“‘ ( 



+ ||x(k+1)-f(x(k),k) || + j 

[GQG‘] 



+ || z(n)-h(x(n),n) || , 

R“ (n) 



+ || x(n+1 )=f(x(n) 9 n) 
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Using the definition of W n (x(n)), we obtain the following functional 
equation; 

I 2 

W +1 (x(n+1))= Min <^W (x(n)) + || x(n+1 )-f (x(n),n) || _i 

n+1 x(n) } [GSG'J 

+ || z(n)-h(x(n)»n) || . 1 

R (n) j (2.21 ) 

We may give W n+1 (x(n+1 ) ) an interpretation similar to the one given 
to V n (x(n)) in case i. Let c be any particular value of x(n+1). Then 
W n+1 (c) is a measure of the unlikeliness of the most probable sequence 
of states { x(o ),».., x(n+1 )] in which x(n+1 ) takes on the particular value 
c, given the observed sequence { z(o)„...,z(n)j and the a priori distribution 
for x(o). The estimate of x(n+1 ) based on the observed sequence { z_(o) t , 
£(n)] is that value of x(n+1 ) for which (x(n+1 )) is minimum,, 

The estimate of x(n) based on { z^(o) to . es z(n)] is the value of 
x(n) for which the following function is minimum; 

2 

v n (£( n )) = W (x(n)) + || z(n)~h(n),n) || 1 

R" 1 (n) (2.22) 

where V n (x(n)) has the same interpretation as in case i. Since the value 
of x(n) for which W n (x(n)) is minimum is the predicted value of x(n) given 
{z.(o)»®o® )} t we may give (2.22) the following simple intuitive inter- 

pretation. The new observation z(n) is weighted according to its reliability, 
R ^ , and used to modify the predicted value of x( n ) "to obtain the current 
estimate of x(n). The degree of uncertainty of prior knowledge is reflected 
in the sensitivity of V n (x(n)) to changes in x(n). 



If the basic system were linear, either of the two functional 
equations, (2.17) or (2 0 21 ) , could be solved analytically by assuming 

4 

a solution of the following form; 

|| x(n)-x(n) || + c(n) 

A(n) 

where A(n) is positive definite and is the inverse of the covariance 
matrix of the conditional distribution for x(n) given the observations, 
and where x(n) is the estimate and c(n) is a scalar. The recurrence re- 
lations obtained by Kalman [ 31 ] Tor the estimate and the covariance matrix 
can be obtained in this manner, but the algebra is quite involved. For 
a linear system, as we shall see shortly, an analytic solution may be 
obtained by other methods when the restrictions of cases i and ii above 
do not hold. In the literature [29, 31 » 32, 25] it has been assumed that 
the transition matrix F(k) is non-singular which corresponds to case i. 

It is perhaps worthwhile to review briefly how the computation 
would proceed using the functional equations. For greater detail the 
reader is referred to [3, 5]» 

For (2,17) and (2.19) w is said to be the decision vector. This 
is because an appropriate ''decision", w(n-1), may be associated with each 
possible state x(n) in such a manner that (2.17) or (2.19) is satisfied. 
For (2.21 ) the situation is slightly different in that x is both the 
decision vector and the state vector. From (2.21 ) we see that with each 
possible state x(n) we may associate an appropriate decision x(n-1 ). As 
the computation proceeds at each step the value of the "cost" function 
and the appropriate value of the decision vector may be calculated for 
each possible value of the state vector. An estimate is obtained by 
choosing the value of the state vector for which the "cost" function is 
minimum. 
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If one is interested only in estimating present state variables 
or predicting future state variables, there is no need to store old 
values of the decision vector for each possible value of the state 
vector. This is different from most dynamic programming problems. 

If it is also of interest to improve the original estimates of 
past states after the entire observation sequence has been observed, then 
the sequence of appropriate decisions for each possible state must be 
stored. Trading time for memory, this may be done on tape. The revised 
estimate of x(k-1 ) is specified by the decision vector associated with 
the revised estimate of x(k)„ 

The importance of the dynamic programming formulation of the es- 
timation problem lies in the possibility of processing each new observation 
as it occurs. Herein there is hope of solving the estimation problem in 
real time. Modern computers have adequate fast memory to handle second 
order systems in this manner. For higher order systems approximation 
techniques, such as polynomial approximations, may be used in conjunction 
with the functional equations. Computational aspects of dynamic programming 
are discussed in [5]« 

2,6 Two- Point Boundary Value Problems 

Let us now consider the problem of minimizing (2,9) which is 
rewritten here for convenience, 

I * 



2 n 2 



I n = i II £(°)-m || 




+ . + X'(k) [x(k+l)-f(x(k),k)~ 

j- 1 00 





(2.9) 
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Setting equal to zero the partial derivatives of I n with respect 
to w(k) , \(k), (for k=o, ... ,n-1 ) , and x(k) (for k=o,...,n) yields the 



following equations which x(kln) must satisfy; 

x(k+l|n) = f(x(k|n),k) + G(k)w(k) k=o,...,n-1 (2.23) 

w(k) = £(k)G» (k)A(k) k=o„ . . . ,n-1 (2.24) 

X_(k-1 ) = ( 3f * (x(kln),n)/ 5x) X(k) + ( ah' (x(k|n) ,k)/a x) 

R" 1 (k) [ z(k) - h(x(k|n) t n)] k=1,...,n (2.25) 

A(n) = o (2.26) 

x(o|n) = m + P(o) Oh" (x(o|n),o)/3x) R" 1 (o) [ z(o)-h(x(o In) ,o) ] 

+ P(o) ( 3 f * (x(o|n),o)/3x) A(o) (2.27) 



Combining (2.23) and (2.24) to eliminate w, we obtain necessary conditions 
in the form of the following two- point boundary value problem; 

x(k+1 In) = f(x(kln) s k) + G(k)£(k)G* (k) A(k) k=o,.,.,n-1 (2.28) 

The equation for A is (2.25) and the boundary conditions are given by 
(2.26) and (2.27). The more general problem of minimizing (2.13) yields 
the following similar two- point boundary value problem; 



x(k+1 1 n) = f(x(k|n),w(k) 9 k) 


k~ o 9 . . . f n— 1 


(2.29) 


w(k) = £(k) ( 3 f » (x(kln) 0 w(k))/ 3 w) A(k) 


k=o,... ,n-1 


(2.30) 



The remaining equations are identical to (2.2 5), (2.26) and (2.27) except 
that ( 3f (x(kln),k)/ <5x) is replaced by ( Sf(x(k|n),w(k),k)/ 3x) in (2.25) 
and (2.27). 



-4l=> 



These two-point boundary value problems are discrete analogues 
of the one obtained by Bryson and Frazier [lo]. 

2„7 The Linear Problem 

The linear version of the basic system is 

x(k+1 ) = F(k)x(k) + G(k)w(k) (2.31 ) 

z(k) = H(k)x(k) + v(k) (2.32) 

In this special case the two- point boundary value problem is 

x(k+1|n) = F(k)x(kln) + G(k)2(k)G«(k) A(k) k=o n-1 (2.33) 

A(k-l) = F»(k) X(k) + H’(k)R” 1 (k) [ z(k)-H(k)x(kln)] k=o„...,n-1 (2.34) 
with the boundary conditions 

x(oln) = m + P(o)H ! (o)R” 1 (o) [ z.(o)~H(o)x(oln)] + P(o)F' (o) >(o) 

(2.35) 

X(n) = o (2.36) 

2.71 Preliminary Discussion 

Before presenting the general solution to this problem, we 
shall first consider the simple cases n=o and n=1 in order to develop 
some feeling for the nature of the solution. 

For n=o, (2.35) and (2.36) may be combined to give 

x(o |o) = m + P(o)H* (o)R" 1 (o) [ z_(o)-H(o)x(ol o) ] (2.37) 



or 
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[I + PCojH'CojR" 1 (o)H(o)] x(o|o) = m + PCojH'CojR” 1 (o)z(o) 

The appropriate inverse will always exist for R(o) positive definite. 

Thus, 

x(olo) = f I + P(o)H* (o)R” 1 (o)H(o) J 1 [ m + P(o)H* (o)R*" 1 (o)z(o)] 

This may be rewritten in the more convenient form 

x(o|o) = m + [ I + P(o)H* (o)R” 1 (o)H(o) ] 1 P(o)H' (o)R“^ (o) [z(o)-H(o)m] 

(2.38) 

Let 

C(o) = [ I + P(o)H* (o)R” 1 (o)H(o) ] -1 P(o) (2.39) 

Then (2.38) becomes 

x(o|o) = m + ^(ojH^ojR” 1 (o) [ z(o)~H(o)m J (2.40) 

A comparison of the boundary condition (2.35) and (2.37) shows that 
(2.35) may be reduced by a similar procedure to the following form; 

x(o|n) =x(o|o) + C(o)F' (o)^(o) (2.41 ) 

The relation (2.41 ) is the key to the general solution, as we shall see 
shortly. 

For the case n=1 , (2.33) » (2.34) and (2.41 ) become respectively, 
x(l|l) =F(o)x(ol1) + G(o)2(o)G'(o)2,(o) (2.42) 

X(o) = H'djR^d) [z(1)-Hx(1|1)] (2.43) 

x(o|1) = x(olo) + C(o)F' (o)\.(o) (2.44) 
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The relations (2.42) and (2.44) may be combined as 



x(1 1 1 ) = F(o)x(olo) +[F(o)C(o)F»(o) + G(o)£(o)G» (o)] X(o) (2.45) 

Let 

P(1 ) = P(o)C(o)P» (o) + G(o)£(o)G* (o) (2.46) 

Using (2.46) and substituting for _X(o) from (2.43), we rewrite (2.45) as, 

x(lll) = F(o)x(olo) + P(1 )H» (1 )R _1 (1 ) [z(1) - H x(1|1)J (2.47) 

Let us note that F(o)x(olo) = x(1lo) by (2.12), and compare (2.47) with 
(2.37) c We come to the immediate conclusion that a procedure similar to 
the one used for n=o will yield an equation similar to (2.40). Let 

C(1) = [ I + P(1 )H« (1 )R =1 (1 )H(1 ) ] _1 P(1 ) (2.48) 

Then the equation corresponding to (2.40) is 

x(1 1 1 ) = x(1 lo) + C(1 )H* (1 )R =1 (1 ) [ z(1 )=H(1 )x(1 1 o)J (2.49) 

Equations (2.41), (2.46), (2.48) and (2.49) will now be shown to 
be similar in form to the solution for arbitrary n. 

2.72. Solution to the Linear Two~ Point Boundary Value Problem 

Theorem ; Consider the linear two-point boundary value problem given 
by (2.33), (2.34), (2.35) and (2.36) where P(o) and R(k) are positive 
definite for k=o,...,n. Then x(kln) satisfies the relation 

x(kln) = x(k| k) + C(k)F* (k)^A(k) k=o,...,n (2.50) 

vdiere x(olo) is given by (2.40) and 
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C(k) = [ I + PCkjH'CkjR” 1 (k)H(k) ] -1 P(k) (2.51 ) 

P(k+1) = F(k)C(k)F'(k) + G(k)£(k)G' (k) (2.52) 

x(kl k) = F(k-1 )x(k-1 Ik-1) + C(k)H* (k)R" 1 (k) 

[ z(k)-H(k)F(k-1 )x(k-1 Ik-1)] k=1 n (2.53) 



To establish a proof we proceed inductively showing that if (2.50) 
is satisfied by x(kln), then it is also satisfied for x(k+1|n). 

Proof ; Suppose that x(kln) satisfies (2.50) for some k € (o,...,n-l). 

Then, by (2.33). 

x(k+1 I n) = F(k)x(klk) + [ F(k)C(k)F ! (k) + G(k)£(k)G f (k)] A(k) 
Substituting P(k+1 ) from (2.52) and using (2.3^) » we obtain 

x(k+1 1 n) = F(k)x(kl k) + P(k+1 )H* (k+1 )R =1 (k+1 ) [ z(k+1 )-H(k+1 )x(k+1 |n)] 
+ P(k+1 )F'(k+1 )_A (k-H ) 



or 

[ I + P(k+1 )H> (k+1 )R =1 (k+1 )H(k+1 ) ] x(k+1| n) = F(k)x(kl k) 

+ P(k+1 )H' (k+1 )R =1 (k+1 )z (k+1 ) + P(k+1 )F> (k+1 )^(k+1 ) 

The needed inverse will always exist. After some manipulation, this yields 
the following equation? 

x(k+1 1 n) = F(k)x(kl k) + C(k+1 )H" (k+1 )R _1 (k+1 ) [ z(k+1 ) 

- H(k+1 )F(k)x(k| k) ] + C(k+1 )F' (k+1 ) A (k+1 ) 

Using (2.53) » we obtain the desired result? 
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x(k+1 1 n) = x(k+1 |k+1 ) + C(k+1 )F' (k+1 )A(k+1 ) 

To complete the proof we note that, by (2.41), the hypothesis is satisfied 
for x(oln). 

Discussion ; Note that no assumption is made that F(k) is non-singular. 

The recurrence relations (2,51 )» (2 .52) and (2.53) were first obtained by 
Kalman [31J in a somewhat different form and using a very different approach. 
Using the following matrix identities derived in Appendix A; 

[i + PH'pHh ] “ 1 PH'R" 1 = PH' [ HPH' + R J" 1 

[I + PH'R" 1 H ] =1 P = P - PH' [HPH» +R]" 1 HP 

we may combine (2.51 ) and (2.52) and rewrite (2.53) to obtain equations 
more closely resembling Kalman's; 

x(k|k) = x(kl k-1 ) + P(k)H'(k) [ H(k)P(k)H' (k) + R(k)l ” 1 

[ z(k)-H(k)x(k|k-1 )] (2 .54) 

P(k+1 ) = F(k) { P(k)-P(k)H« (k) [H(k)P(k)H» (k) + R(k) ] " 1 H(k)P(k) } F' (k) 

+ G(k)£(k)G » (k) (2.55) 

For the linear system, (2.31) and (2.32), all probability distribu- 
tions remain Gaussian. P(k) is the covariance matrix of the distribution 
for x(k) given [ z(o),... ,z (k-1 )] and C(k) is the covariance matrix of the 
distribution for x(k) given { z(o),... ,z(k)j . (see Appendix D) 

The solution of the problem of estimating the present state of the 
system is given by (2.51), (2.52) and (2.53), or equivalently by (2.51), 

(2.54) and (2.55)o In practice the decision to use either (2.52) and 
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Fig. 7 Noisy linear system and optimal estimator 
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(2,53) or the equivalent relations (2.54) and (2.55) would be made to 
minimize the dimension of the matrix to be inverted. A block diagram 
of the system and the estimator is given in Fig. 7. The optimal estimator 
is linear and consists of a model of the system in a feedback loop. Com- 
pare Fig. 7 with Fig. 4. 

If the estimate, x(k|n) is desired, one could proceed as follows; 

1. Compute and save x(klk) and C(k) for k=o,..,,n using (2.51 )» 

(2.52) and (2.53) or, equivalently (2.51 (2.54) and (2.55). 

2. Calculate x(kln) by backward recursion of (2.34) and (2.50). 

This procedure begins by calculating _\(n»1 ) using (2.34). 

Then obtain x(n-lln) using (2.50). Use (2.34) to obtain 
X(n^2) and then (2.50) to obtain x(n-2ln). Continue this 
alternating procedure until x(oln) is obtained. 

An alternate procedure would be to combine (2.50) and (2.34) to 
obtain the following equation for _A ; 

\(k-1) = [i - H' (k)R =1 (k)H(k)C(k) ] F'(k)A(k) 

+ H’G0lf 1 ( k ) [z(k)-H(k)x(k|k)] (2.56) 

The following matrix identities can easily be derived by the 
methods of Appendix A; 

I “ H'R^H [ I + PH’fHh ] “ 1 P = [I + H 8 If 1 HP J" 1 
= I - H* [ R + HFH ! ] ” 1 HP 

Using these identities, (2.56) may be rewritten in different forms. 

Then step 2 above can be replaced by the following; 
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2. Calculate _^(k) by backward recursion of a convenient 
form of (2,56). Then calculate x(k|n) using (2.50). 

By repeated use of matrix identities (2.56) may be written in 
the following form which later will be compared with other results; 

A0o-1) = { I - H'(k) [ R(k) + H(k)P(k)H' (k) ] “ 1 H(k)P(k)J F'(k)A(k) 

+ H' (k) rS(k)P(k)H' (k) + R(k) ] =1 [ z(k)-H(k)x(k|k-1 )] (2.57) 

The solution of the linear smoothing problem may be given in an 
even simpler form if we write the equations in terns of x(k|k-l). Later 
when we discuss the application of linearization techniques to nonlinear 
estimation problems, the formulas we shall obtain will be analogous to 
the ones already given for the linear problem,, since we shall wish to 
linearize about the point x(klk) rather than the point x(klk-l). 

Combining (2.50) and (2.54) using the matrix identities, we obtain 
the following equation; 

x(kln) =x(klk~1) + P(k)H* (k) [ H(k)P(k)H» (k) + R(k)] =1 

[z(k)~H(k)x(klk~1) J+ P(k){ I - H*(k) [ H(k)P(k)H> (k) + R(k)] 
H(k)P(k)} F* (k) X(k) 

Combining this relation with (2.57) yields 

x(kln) =x(k|k-1) + P(k)A(k=1 ) 

We may summarize this result in the following corollary; 

Corollary ; The solution of the linear smoothing problem is given by 
the following equations ; 
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x(k|n) = x(k|k-l) + P(k)A(k-1 ) 

x(k+1 1 k) = F(k)x(k|k-1) + F(k)P(k)H» (k)[ H(k)P(k)H» (k) + R(k)] " 1 
[ z (k) - H(k)x(kl k-1 ) ] 

where P(k) is given by (2.55) and _A(k=.1 ) is given by (2.57) for k=o,...,n. 

The a priori mean m is used for x(o 1-1 ) in the above expressions and 
A.(n) = o. 

2.73 The Existence of [I + PH'iHh J" 1 

In the following discussion we shall use some matrix identities 
and the following elementary facts from matrix theory developed in Appendix A 

i. The inverse of a positive definite matrix is positive 
definite. 

ii. The matrix formed by adding a positive definite matrix 
to a non-negative definite matrix is positive definite 
and hence, has an inverse. 

One may verify by direct multiplication that the unique inverse 
of I + PH’R" 1 ! is 

I - PH* [HPH‘ +R]“ 1 H (2.58) 

If ,R is positive definite and P is non-negative definite, (2.58) will 
always exist by virtue of ii. Using (2.58), C(k) may be written in the 
following form; 

C(k) = P(k) - P(k)H»(k) [ H(k)P(k)H‘Xk) + R(k)] _1 H(k)P(k) (2.59) 
The relation (2.59) is a generalization of a lemma given by Ho [ 25], who 
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required that P(k) be positive definite. 

While the inverse of a matrix is unique, it frequently may be 
written in various forms which bear little resemblance to each other. If 

A 

P(k) is positive definite for example, the inverse of [ I + PH ' R H ] may 
be written in the following form; 

[ ff 1 + H , R~ 1 H]" 1 P~ 1 (2.60) 

which surprisingly is identical to (2.58). Using (2.60), we may write 
(2.51 ) in the following form; 

C(k) = [ P" 1 (k) + H» (k)R" 1 (k)H(k)] _1 (2.61 ) 

From (2.61) and ii it follows that if P(k) is positive definite, then C(k) 
is positive definite. This is in accord with the fact that P(k) and C(k) 
are the covariance matrices of the probability distributions for x(k) 
given ( z(o),...,z(k-1 )] and { z(o),... ,z(k) j respectively. A singular co- 
variance matrix signifies that x(k) is confined with probability one to a 
hyperplane in n-spaee. It is clear that this cannot happen as the result 
of measurements which are corrupted with noise having a non-singular co- 
variance matrix R(k). 

Consider (2.52), which we rewrite for convenience, 

P(k+1 ) = F(k)C(k)F» (k) + G(k)£(k)G« (k) (2.52) 

From (2.52) we see that a sufficient condition for P(k+1 ) to be positive 
definite is that either G(k)£(k)G* (k) be positive definite or C(k) be 
positive definite and F(k) be non-singular. This may easily be related 
to the basic equation for the system, 
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x(k+1 ) = F(k)x(k) + G(k)w(k) 



(2.31) 



From (2.31 ) we see that P(k+1 ) will be positive definite if every point 
in n-space may be reached in one transition from some point in the set 
of possible values of x(k) (specified by C(k) ) using some w(k). Thus, 
the condition obtained from (2.52) is seen to be sufficient but not nec- 
essary. 

Finally, we note that if P(o), R(k) and F(k) are non-singular for 
all k, then P(k) and C(k) are also positive definite for all k. 

2.7*1- Generalizations 

The restrictions that P(o) and R(k) be positive definite may be 
relaxed without altering the basic structure of the solution. In order 
to relax these restrictions we use the generalized inverse discussed in 
Appendix B, Appropriate background material concerning probabilistic 
aspects of singular covariance matrices is given in Appendix C. In this 
section we shall simply state the results obtained in Appendix D and com- 
pare them with those already obtained in this chapter. For a more detailed 
discussion the reader is referred to Appendix D, which considers a very 
general version of the linear estimation problem in which correlation 
between v and w is allowed and the covariance matrices P(o) and R(k) may 
be singular. 

The solution of the linear estimation problem is given by the 

1 

following equations which are the same as (2,50), (2,59), (2.52), (2.5*0 
and (2,57) except that in (2.59), (2.5*+) and (2.57) the inverse is re- 



placed by the generalized inverse; 

x(k|n) = x(k|k) + C(k)F'(k)^(k) k=o,...,n (2.50) 
C(k) = P(k)-P(k)H'(k) [ H(k)P(k)H«(k) + R(k)J # H(k)P(k) (2.62) 
P(k+1 ) = F(k)C(k)F« (k) + G(k)£(k)G' (k) (2.52) 
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x(k|k) = x(k Ik-1 ) + P(k)H»(k)f H(k)P(k)H'(k) + R(k)] # 

[ z(k)-H(k)x(klk-1)] k=1 , ... ,n (2.63) 

A(k-1 ) = { I - H* (k) [ R(k) + H(k)P(k)H* (k)] # H(k)P(k) j F» (k)A(k) 

+ H»(k) [ R(k) + H(k)P(k)H» (k)] # [ z(k)-H(k)x(kl k-1 )] 

k=1,...,n (2.64) 



The boundary conditions are 



x(olo) = m + P(o)H*(o) f H(o)P(o)H'(o) + R(o)] # [ z(o) ■- H(o)m] (2.65) 



A.(n) = o 



The generalized inverse may be replaced by any pseudo-inverse. The 
pseudo-inverse was used by Kalman [29 ] in his solution of the linear fil- 
tering and prediction problem. We have extended this work to include the 
smoothing problem and have relaxed the restriction that F(k) be non-singular. 
The only restrictions remaining are that R(k) and P(o) be valid covariance 
matrices and that observations do not occur which are inconsistent with 
prior knowledge. This point is discussed In Appendix C. The block diagram 

of Fig. 7» in which the estimator producing up-to-date estimates of present 

■ > 

state variables consists of a model of the system in a linear feedback 
loop, remains valid if C(k)H t (k)R =1 (k) is replaced by 

P(k)H ' (k) [ H(k)P(k)H» (k) + R(k) ] * 

These two expressions are equivalent if R~^ exists. 



If one is interested in the linear smoothing problem, the corollary 
of section 2.72 remains valid if the inverse in the equation for x(k+l|k) 
is replaced by the generalized inverse. 
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2.8 An Approximation Technique 



2 e 8l Linear Observations 

There is a great need for approximation techniques which can be 
used to obtain estimates in real time. In this section we shall present 
a method which is computationally attractive and which possesses some 
intuitive appeal. 

For simplicity we shall first assume that h(x) is linear, an 
assumption which will appreciably simplify the resulting equations and 
which represents a case of great practical significance. In this case we 
say that the observations are linear. Consider the approximation to (2.1 ) 
obtained by the following linearization! 

x(k+1 ) ~ f (x*(k|k)„k) + ( 5f(x*(k|k),k)/S x) [ x(k)~x*(k| k)] + G(k)w(k) (2.66) 

The point x*(k|k) about which the linearization is made is the estimate 

of x(k) given { z^(o )„..., ^(k)}. We use the asterisk to remind ourselves 

that the estimates produced by this linearization procedure are only 

approximations and do not necessarily correspond exactly to the mode of 

the a posteriori distribution. 

If the function I of (2.9) is replaced by 

2 n 2 

I n = £l|x(o)-jn || + X] i l|z(k)-H(k)x(k) || 1 

n P“ (o) k=o R“V) 

n i 2 

+ £l|w(k)|| + ^> (k) [[x(k+1 )-f (x*(klk)„k)] 

k=o 1 2” 1 (k) u 

- ( df(x*(k|k)„k)/5x) [ x(k)~x*(k|k)] „ G(k)w(k)j} (2.6?) 

and the partial derivatives of I*, with respect to w(k), \_(k) and 
x(k) are set equal to zero, a modified version of the two-point 
boundary value problem is obtained. The resulting 



equations are as follows; 



x* (k+1 1 n) = f(x*(klk) s k) + (3 f (x*(kl k),k)/a x) [ x*(k |n)-x*(k Ik) J 

+ G(k)£(k)G* (k)A(k) (2.68) 

\(k=1 ) = (3f«(x*(klk),k)/3x) A(k) + H> (k)lf 1 (k) [ z(k)-H(k)x*(k In)] 

(2.69) 

The boundary conditions are 

X(n) = o (2.70) 

x*(o In) = m + P(o)H' (o)R” 1 (o) [ z,(o)-H(o)x*(o| n) ] 

+ P(o)(3f’(x*(o|o),o)/Sx) X(o) (2.71) 

A comparison of (2.71) and (2.35) indicates that (2.71) may be written 
in the following form which is similar to (2.41 ); 

x*(oln) =x*(°l°) + C*(o)(s f 1 (x*(olo),o)/a x) ^( 0 ) (2.72) 

where x*(olo) and C*(o) are equal to x(o|o) and C(o) respectively and are 
given by (2.40) and (2.39). It is not at all surprising that the solution 
of this linearized problem closely resembles that of the linear problem 
already studied. We define the following quantities; 

C*(k) = [ I + P*(k)H> (k)R” 1 (k) ] _1 P*(k) (2.73) 

P*(k+1) = ( of (x*(klk) ,k)/ 3x) C*(k) ( 3 f ' (x*(klk),k)/a x) 

+ G(k)£(k)G'(k) (2.74) 

P*(o) = P(o) (2.75) 
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We now hypothesize that the x (k|n) satisfying the two-point 
boundary value problem (2.68), (2.69), (2.70) and (2.71) is given by 
the following equations? 

x*(k|n) = x*(k|k) + C*(k)(a f ' (x*(klk),k)/s x) A(k) k=0„...,n (2.76) 



where 

x*(k|k) = f (x*(k-1 |k-1 ) f k-1 ) + C*(k)H' (k)R" 1 (k) 

[ z(k)-H(k)f(x*(k-l|k-l),k-1)J k=1 .... ,n (2.77) 

and x*(o|o) is given by (2.40). We again proceed by induction showing 
that if (2.76) is satisfied for some k G (o,...,n-l) then it is also sat- 
isfied for k+1 . We know that (2.76) is satisfied for k=o. The steps in 
the following proof are completely analogous to those used to prove the 
theorem of section 2.72. 

Suppose that x (k|n) satisfies (2.76) for some k € (o,...,n-l). 

Then, by (2.68) 

x*(k+1|n) = f(x*(k|k),k) - ( a f(x*(klk),k)/0x) x*( k * k ) 

+ (sf(x*(klk),k)/3 x) x*(klk) 

+ { (3f(x*(k|k),k)/3x) C*(k) ( 3 f ‘ (x*(kl k) ,k)/s x) 

+ G(k)£(k)G'(k)j A(k) 

or, equivalently 

x*(k+1ln) = f(x*(klk),k) + P*(k+1 ) X(k) 

Substituting for X(k) from (2.69) this expression becomes 
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x*(k+1 1 n) = f (x*(klk),k) + P*(k+1 )H» (k+1 JR" 1 (k+1 ) 

[ z(k+1 ) - H(k)x*(k+1 1 n) ] 

+ P* (k+1 )(3 f * (x*(k+1 I k+1 ),k+1 )/d x) ^(k+1 ) 

3|t 

Solving this expression for x (k+1 In) yields 

x* (k+1 In) = [ I + P* (k+1 )H« (k+1 )R ra1 (k+1 )K(k+1 )] -1 

| f (x*(k Ik) ,k) + P*(k+1 )H* (k+1 )R _1 (k+1 )z(k+1 ) 

+ p* (k+1 ) ( a f * (x* (k+1 1 k+1 ) , k+1 )/ a x) A(k+1 ) ] 

This expression may be rewritten in the following form; 

x* (k+1 1 n) = f (x*(k|k),k) + C*(k+1 )H* (k+1 JR" 1 (k+1 ) 

[ z/k+1 )-H(k+1 )f (x(k Ik) ,k) ] 

+ C*(k+1 ) ( 3 f * (x* (k+1 1 k+1 ) ,k+1 )/a x) \ (k+1 ) 

Using (2.77), this expression becomes 

x*(k+ll n) = x*(k+1 I k+1 ) + C*(k+1 )(3 f ' (x*(k+1 1 k+1 ),k+1 )/a x) ^(k+1 ) 

This expression satisfies the hypothesis (2.77) and noting that (2.77) 

is satisfied for k=o concludes the proof. 

Equations (2.73), (2.7*0 and (2.77) constitute a recursive scheme 

in which each new observation can be processed immediately to produce an 

$ $ 

up-to-date estimate. The matrices C and P can no longer be strictly 
interpreted as covariance matrices of the distribution for x, as they 
could be if the basic system were actually described by (2,66) instead 
of (2.1). 
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Fig„ 8 Nonlinear discrete- time estimator 
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A block diagram of the estimator described by equation (2.77) is 
given in Fig. 8. Again we see the model of the system appearing in a 
feedback loop. Equation (2.77) or, equivalently, the estimator of Fig. 8 
may be given a simple intuitive interpretation if we interpret the in- 
dividual terns in the following mariner; 



f (x*(k-1 |k-1 ) ,k-1 ) 
C*(k) 

H‘(k) 

IT 1 (k) 

H(k)f(x*(k-l|k-1),k-1) 



a prediction of x(k) 

a measure of present uncertainty 

sensitivity of observation to deviations in x(k) 

reliability of measurements 

a prediction of z.(k) 



The equation then has the interpretation that the error in the prediction 
of z_(k) is weighted in proportion to present uncertainty, sensitivity of 
the observation, and reliability of the measurement. This weighted error 
is then used to modify the predicted value of x(k) to obtain an up-to-date 
estimate of x(k). 

Equations (2.73)* (2.7*0 and, (2.77) or equivalent equations ob- 
tained by the use of matrix identities are particularly suitable for real 
time implementation on a digital computer. Simulation studies may be 
conducted to evaluate the applicability of this approximation to a 
particular system. The results of some simulation studies using this 
linearized estimation technique are given in Chapter V. 

When a detailed analysis of all available data is to be made after 
the completion of some experiment such as a rocket flight, one is faced 
with numerical solution of the nonlinear two-point boundary value problem. 
Some method of successive approximations, such as a version of the gradient 
method of Kelley [35 ] and Bryson [ 9] * is usually called for. There is 
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danger of converging to a local minimum if the conditional probability 
density function (2 .7) is not unimodal. This danger is reduced and the 
speed of convergence increased if a good first approximation is available. 

The logical choice for such a first approximation is x*(k|n) which may be 
calculated by backward recursion of (2.69) and (2.76). Alternately, we 
may combine (2.69) and (2. 76) to obtain the following equation for \ 
which is similar to (2.56)5 

A(k~1) = [ I - H'Mff 1 (k)H(k)C*(k) ] (a f' 1 (x*(kl k) ,k))/a x) A(k) 

+ H'(k)R" 1 (k) [ z(k) - H(k)x*(klk)] (2.78) 

Then x*(kln) may be calculated using (2.78) and (2.76). 

2.82 Generalizations 

Having obtained some confidence that linearization techniques will 
produce equations closely resembling the equations for the linear estima- 
tion problem, there are several directions in which one could generalize. 
First, one could complicate the problem without introducing new nonlinear- 
ities by allowing R(k) and P(o) to be singular and permitting correlation 
between w and v. That is, one could study the problem of Appendix D using 
(2.66) in place of (2.1 ). Second, one could introduce new nonlinearities 
by allowing h to be nonlinear and/or allowing w to enter nonlinearly as 
in (2.3). Third, one could combine these first two types of complications. 

We shall discuss each of these possibilities briefly, indicating 
how the results obtained previously would be modified in these situations, 
but omitting detailed derivations. 

The generalization to include singular R(k) and P(o) and allow 
correlation between v and w is completely straightforward. The derivation 
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of these results is completely analogous to section D5 of Appendix D 
and may easily be obtained by using section D5 as a guide. The basic 
equations which result from such a derivation are as follows; 

x*(k|k) =x*(k|k~1) + P*(k)H'(k)B*(k) [ z(k)-H(k)x*(klk-1)] (2.79) 

x*(k+1 1 k) = f(x*(k|k),k) + G(k)S(k)B*(k) [ z(k)-H(k)x*(klk-1 )] (2.80) 

x*(k|n) = x*(klk) + P*(k) [ (3 f * (x*(klk),k)/s x)-H* (k)M*(k) ] X(k) 



k=o ; o o o ( 2 © 8 1 ) 

>(k-l) = [ ( 3 f'(x*(klk) 5 k)/ax) - H*(k)M*(k)] >(k) 

+ H'(k)B*(k) [ z(k)~H(k)x*(k|k-l)] k=1,„..,n (2.82) 

P*(k+1) = (& f(x*(k!k),k)/ ax) P*(k) (af 1 (x*(kl k) ,k)/s x) 

+ G(k)£(k)G'(k) - M*’(k)B*#(k)M*(k) (2.83) 

B*(k) = [ H(k)P*(k)H* (k) + R(k)] # (2.84) 

M*(k) = B*(k) [ H(k)P*(k)( af 11 (x*(kl k),k)/ a x) + S*(k)G*(k)] (2.85) 



A block diagram of the estimator described by equation (2.79) and 
(2.80) is given in Fig. 9 * Again we have a model of the system appearing 
in a feedback loop. 

If new nonlinearities are introduced, then we are forced to further 
linearization. We shall first consider the case in which h is nonlinear. 

In order to acquire a feeling for this situation we shall examine the 
simplest possible example, n=o. Then, 

2 2 

I o = Il|x(°)-Sill i + i II z(o)-h(x(o),o) II 

F" 1 Co) R~ (o) (2.86) 
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Dashed lines indicate x (k-llk~1) is needed to calculate P*(k) and B*(k) 



Fig e 9 Nonlinear discrete- time estimator when v(k) and w(k) are correlated,, 
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or, equivalently. 



2 2 

I = 1 1| x(o)-m || + | II z(o)-h(m,o)- [ h(x(o) ,o)-h(m,o)] || 1 

" " r(o) r(o) 

(2.87) 

This may be rewritten as 



2 2 2 
I = -§||x(o)-m|| . + i||z(o)-h(m,o)|| 1 + i||h(x(o),o)=h(m,o)|| 1 

P (o) R” (o) R"'(o) 

- [ h(x(o),o)-h(m,o)]' JT 1 (o) [z(o)~h(m,o)] (2.88) 

The second term in this expression is independent of x. The third and 
fourth terms we expand about the a priori mean m retaining the first non= 
zero terms. The fourth term becomes 

(x(o)-m)' (ah’ (ffl,°)/3x) R” 1 (o) [ z(o)~h(m,o)] (2.89) 

The third term becomes 

j- (x(o)-m)‘ { 3[(3h , (l.o)/sx) R" 1 (o) h(m,o)]/ax j (x(o)-m) (2.90) 

Let 

C =1 (o) = P^ 1 (o) + 3 [(3h ! (m„o)/ax) R =1 (o) h(m,o)] /ax (2.91) 

Then (2.86) may be approximated by the following expression 




i !|x(o)-m || + i II z(o)-h(m,o) || 

C- 1 (o) R~ (o) 

+ (x(o)~m) ! (ah’ (m,o)/<£>x) R =1 (o) [ z(o)~h(m,o)] 



(2.92) 
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Differentiating this expression with respect to x(°) an ^ setting the 
result equal to zero yields 

x*(o lo) = m + C(o)( 3 h' (m,o)/a x) IT 1 (o) [ z(o)-h(m,o) ] (2.93) 

Equations (2.91) and (2,93) are similar in structure to (2,72) and (2,77), 

If we interpret (3h'(m,o)/3x) as the sensitivity of the observations to 
deviations of x from the prior mean, we may give (2,93) and interpretation 
similar to that which was given in (2,77) « The same basic structure is 
retained for arbitrary n. Note that h(x(k),k) is linearized about 
x (k|k-l) since x (k|k) cannot be computed until z(k) is observed. The 
expression for C(k) is of the same form as the expression for C(o) given 

i 

by (2,91), P(k+1 ) is computed by (2,74) as before. 

Let us again consider the case n=o, but now allow R(o) to be sing- 
ular. In this case, following Appendix D, we consider v to be the result 
of a linear transformation on a normalized random vector u. That is, 
v = Tu where TT' is equal to R(o). Introducing the Lagrange multiplier 
vector the expression to be minimized becomes 

2 2 

Iq = 2 l|x(°)-S il , + ill a II + i* [ 2 (o)-h(x(o),o)~Tu] (2.94) 

r 1 (o) 

This expression will be approximated by the following linearized expression; 

2 2 

Iq = ills(o)-m|| . + ill a II + I' f z(o)»h(m,o)-(a h(m,o)/3x) 

r 1 (o) 

[x(o)-m] - Tu | (2.95) 

Setting the total differential, dI*(x(o),u,£), equal to zero yields the 
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following equations; 



x*(o lo) = m + P(o)(3 h' (m,o)/a x) ? 


(2.96) 


s|( 

z(o) = h(m,o) + (•& h(m,o)/sx) [ x (oio)-m] + Tu 


(2.97) 


u = T' ? 


(2.98) 



Combining these three equations, we obtain 

z(o)-h(m,o) = [ (3h(m,o)/ax) P(o) (a h 1 (£.o)/a x) + R(o)J §_( o ) (2.99) 

Using the generalized inverse to solve for a ^(o) and combining (2.96) 
and (2.99) yields 

x(o) = m + P(o) (3h'(ro,o)/sx) [ ( Sh(m,o)/c> x) P(o) 

(3 h' (m,°)/d x) + R(°) [z(o)=h(m,o)] (2.100) 

Note that (2.93) and (2.100) are not the same, nor can they be 
shown to be equal using matrix identities even if R“^ (o) exists. The 
reason for this situation is that if R”^ does not exist we are forced to 
linearize earlier than if R”^ exists. The procedure we have used for 
singular R is equivalent to approximating the second term in (2.88) by 
the following expression; 

2 

j l|z(o)»h(m,°)-(3h(m,o)/0x) [x(o)-m]|| 1 

R (o) 

Here the approximation takes place before the multiplication; whereas, 
previously we were able to multiply and then approximate the resulting 
expression. 

The basic structure of the situations which arise when h is non- 



linear is well illustrated by the simple case n=o. The required modifica- 
tions of our previous results are evident in (2.91 )» (2.93) and. (2.100). 

Suppose that w entered the system nonlinearly. We would then 
approximate 

x(k+1 ) = f(x(k), w(k), k) (2.3) 



by 

x(k+1 ) ^ f(x*(klk),o,k) + (a f (x*(k Ik) ,o,k)/ a x) 

[x(k)-x*(klk)] + ( a f (x*(k|k),o,k)/a w) w(k) (2.101) 

It is clear that (a f (x*(klk) ,o,k)/3 w) will play the role of G(k) in 
our previous work. 

The reader should have no difficulty in combining these results 
to meet the needs of any particular problem. How well these approximation 
techniques will work will, of course, depend on how well the system is 
approximated, the frequency of measurements , noise levels and other 
factors peculiar to the individual problem. Similar linearization schemes 
have been developed from another point of view in recent independent 
studies by other investigators [37k 59] who give encouraging results. 

2.83 Linearity in Disguise 

In certain special cases the approximation of (2.1) given by (2.66) 
becomes exact and equations given by the linearization technique are 
strictly correct. We rewrite (2.1 ) and (2.66) for convenience. 

x(k+1 ) = f (x(k),k) + G(k)w(k) (2.1 ) 

x(k+1) = f(x*(k|k),k) + O f(x*(k|k),k)/c> x) [ x(k)-x*(k|k)] + G(k)w(k) 
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( 2 . 66 ) 



The first case is the trivial one in which f(x(k),k) is linear. That is, 
when (2.1 ) takes the following form; 

x(k+1 ) = F(k)x(k) + G(k)w(k) (2.31 ) 

The second case arises when all state variables are exactly measurable 
and x (klk) is equal to x(k). 

Various combinations of these two cases may arise in which it is 
not so obvious that the underlying estimation problem is linear even though 
the basic equations are nonlinear. Consider the case in which the state 
vector x may be decomposed into an exactly observable part Xj and a re- 
maining part Xg" The underlying estimation problem will be linear if the 
basic system is of the following form; 



(k+1 ) 




(£, (k),k) 


£-1 2 (^) 9^0 




£) (k) ' 


+ 


J 

JO 
1 


Xg (k+1 ) 




0 


i 

s-/ 

(M 








G, 2 (k) _ 



( 2 . 102 ) 

z(k)=x 1 (k) (2.103) 

In this case z may be substituted in the equation for x^ to produce the 
following relation; 

z(k+1) - Z-| 1 (^(k) ,k) z(k) = F 12 (z(k),k) Xg(k) + G^k) w(k) (2.104) 

The relation shows that a difference of known quantities is equal to a 
linear combination of the unknown part of the state vector plus additive 
Gaussian noise. This, together with the following linear equation for x^ 

Xg(k+1) = F 22 (k) x 2 (k) + G^(k) w(k) (2.105) 

shows that the estimation problem is linear and the distribution for 



remains Gaussian even though the basic system (2.102) is nonlinear. 

For example, consider the following simple system with a random 
parameter X£$ 

x 1 (k+1 ) = [ a + x^k) ] x 1 (k) + w 1 (k) 

x 2 (k+1) = b XgCk) + w 2 (k) 

z(k) = x 1 (k) + v(k) 

If the additive noise v is not identically zero the estimation problem 
is nonlinear and the use of the linearization technique would give only 
an approximate solution to the estimation problem. If v is identically 
zero this is a special case of (2.102). Then the estimation problem 
is linear and the linearization technique gives the exact solution to 
the estimation problem. We see from this simple example that the validity 
of the approximation may decrease as noise levels increase. This is in 
addition to the expected decrease in the performance of even the optimal 
estimator when noise levels increase. 

As a second example, consider the following system in which the 
transition matrix F(k) is random; 

x(k+1 ) = F(k)x(k) + &, (k)w(k) (2.106) 

z(k) = x( k ) (2.107) 

The elements of the transition matrix F(k) are considered to be the out- 
puts of a linear system excited by an independent noise sequence. In 
order to see how such a situation would be handled we shall examine the 
following second order system; 
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x 1 (k+1 ) 




rf„(k) 


fl 2 (k) 




x 1 (k) 


+ G.j (k)w(k) 


x^k+1 ) 




JX 

1 


f 22< k > - 




_x 2 (k) 





z.j (k) = x 1 (k) z 2 (k) = x 2 (k) 



Let us augment the state-space by adjoining the random parameters to 



the state vector as follows; 







r 






f^Oc+1) 




*3 (k+l) 




x^(k) 


f 12 ( k+1 ) 





x^(k+1 ) 


= A(k) 


x 4 (k) 


Vk+o 




x 5 (k+1 ) 


x^(k) 


r 22 (k + i ) 




x 6 (k+1 ) 




x 6 (k) 






, 




- 



G 2 (k)w(k) 



Then the equation for the augmented system may be written in the fora of 
(2.102) as follows; 



x 1 (k+1 ) ' 




,x 1 (k) x 2 (k) 0 0 




x 1 (k) 




o' 




Xg(k+1 ) 




, 0 0 x 1 (k) x 2 (k) 




. x ? (k) . 


x^(k+1 ) 




l 

\ 




x^(k) 


x 4 (k+1 ) 




0 , A(k) 




x 4 (k) 


x 5 (k+1 ) 




1 

1 

1 




x^(k) 


x 6 (k+1 ) 




1 

\ 




x 6 (k) _ 



—2 



The basic technique used for the second order example is readily 
extended to the nth order case. 

The fact that a nonlinear estimation problem may become linear 
if certain noise levels go to zero can be a useful tool in investigating 
nonlinear estimation problems. If we observe that such is the case for 
a particular estimation problem, we immediately acquire an upper bound 
on the performance that can be obtained. In addition, we may gain 



insight into the coupling between various noises and the quantities to 
be estimated# For example, in (2# 104) we see that the random input w 
is playing the role of additive measurement noise in the problem of 
estimating Xg# 

Lest we become complacent, consider what would happen if for 
(2.102) we were able to observe only Xg# It is evident that we would 
acquire no new information about x^ # This difficulty raises the inter- 
esting question of observability which will be discussed in Chapter IV# 

2.84 Successive Linearization 

If one is interested in a problem of ex post facto data analysis 
the job of obtaining a numerical solution of the two- point boundary value 
problem arises. As was mentioned earlier, a logical first approximation 
to the solution would be x*(k|n). One method that might be considered 
is that of successive linearization# This method is recommended on the 
grounds that it is comparatively simple and that rapid convergence can 
be expected. Unfortunately, convergence cannot be guaranteed. This is 
the price to be paid for simplicity. In a problem where there is no re- 
quirement for real time solution it may be worthwhile to try this method 
first in hopes of simply obtaining a fast solution while running a slight 
risk of non- convergence. The results of some encouraging applications 
of this technique are given in Chapter V. 

The idea behind the method of successive linearization is this. 

* 

The estimate x (k+1 1 k+1 ) is generated by linearizing about the point 
x*(k|k)# After the entire observation sequence [ ,z (o) , # . . ,z.(n) \ has 
occurred, the smoothed sequence £ x*(k |n)] , being based on more measurements, 
would hopefully be a better trajectory about which to linearize than 
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[x*(k|k)j was. One can then use a modified version of (2.77) obtained 
by linearizing about the sequence f x*(k|n)I to reprocess the observations 
and obtain a sequence { Xg(klk)l . This sequence should be better than 
[x*(klk)£ , since the linearization took place along a better trajectory. 

This sequence is then smoothed using modified versions of (2.76) and (2.78) 
to obtain a second approximation to the solution {x^kln)?. The procedure 
is then repeated linearizing about { x^kln)] . 

In the development of the equations for this technique we shall 
designate the sequence [ x*( k ln)j by {x^(k|n)j to emphasize that it is the 
first approximation to the solution. Suppose that we have obtained the 
ith approximation sequence (k| n) and we seek approximation i+1 . Con™ 
sider the approximation to (2.1) given by the following linearization; 

x(k+l)^ ^^(klnj.k) + (af(x i (k|n),k)/-0x) [ x(k)~x i (k|n) ] 

+ G(k)w(k) (2.108) 

If this expression is substituted into (2.9) a modified two-point boundary 
value problem is obtained; 

£^ +1 (k+1 1 n) = ffe^klnj.k) + ( Sf (x^k |n),k)/s x) [ x i+1 (kln)-x i (kln) ] 

+ G(k)£(k)G*(k)X(k) (2.109) 

X(k-1 ) = ("Sf ' (x i (kln) 9 k)/B x) A(k) + H* (k)R” 1 (k) [ jz(k)=H(k)x i+1 (k| n)] 

( 2 . 110 ) 

The boundary conditions are 

X(n) = o (2.111) 

-i+1 (° ,n ^ = 2 + £(°)S‘ (°)£ =1 (°) [ z(o) (oln)] 

+ P(o) (3 £' (^(oln),o)/3x) X(o) (2.112) 
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For simplicity we first assume that the observations are linear 
and that R(k) is positive definite . From (2„1 09) we see that if it con- 
verges, the successive linearization technique converges to a solution 
of the nonlinear two-point boundary value problem,, 

The boundary condition (2,112) may be rewritten in the following 

form; 

x^ +1 (o|n) =x i+1 (o|o) + C^oXs fKx^oInJ.oJ/ax) \( 0 ) (2,113) 

where x^ (o |o) and C^o) are x(o|o) and C(o) respectively as given by 
(2,39) and (2.40). 

As usual, we assume a solution of the following form; 

— i+ 1 (kin) =£ L+1 (k|k) + £,(^(3^ ' (^(kln) ,k)/a x) ^(k) (2.114) 

Substituting (2.114) into (2.109) yields 

2l + 1 (k+1 1 n) = 3^ +1 (k+1 1 k) + P^k+1 ) X (k) (2.11 5) 

where x^ +1 (k+1 |k) and P^(k+1 ) are defined as follows; 

x i+1 ( k +H k ) =K^( k ln), k ) + (3 £(2j_(k |n) ,k)/3 x) 

[ ( k l k ) - x^kM] ( 2 . 116 ) 

P^(k+1 ) = (3 ^(^(klnjjk)/© x) C^k) (a f ' (^(klnjjk)/^ x) 

+ G0c)2( k )G' (k) (2.117) 



Combining (2.115) and (2.110) yields 



x^ (k+1 1 n) = x ±+1 (k+1 1 k) + P ± (k+1 )H l (k+1 JR" 1 (k+1 ) [ z(k+1 ) - H(k+1 )x ±+1 (k+1 1 n) ] 

+ PCk+1 )(3f'(^(k+1 |n),k+1 )/bx) \ (k+1 ) (2.113) 

Solving (2.118) for x (k+1 In) we obtain the following relation which 
satisfies hypothesis (2.114); 

x^ (k+1 1 n) = x (k+1 I k+1 ) + C^(k+1 )(’df l (x ± (k+1 In), k+1 )/a x) A (k+1 ) 

(2.119) 

where 

x„ „ (k+1 1 k+1 ) = x, . (k+1 Ik) + C. (k+1 )H« (k+1 JR" 1 (k+1 ) 

[ z (k+1 ) - H(k+1 )3^ +1 (k+1 1 k) ] (2.1 20) 

and 

C„ (k+1 ) = [ I + P. (k+1 )H' (k+1 )R _1 (k+1 )H(k+1 )K " 1 P, (k+1 ) (2.121 ) 

We have now derived a familiar looking set of equations . The 

sequence{x. (klk)j may be calculated using (2.116), (2.117), (2.120) 

1 ' « 

and (2.121). The i+1 approximation to the solution f x^ (k|n)] may be 

obtained using (2.120) and (2.110). 

■) 

These results may be put in an alternate form; combining (2.110), 

(2.114) and (2.120) yields 

A(k-1) = [I - H«(k)R =1 (k)H(k)C i (k)J (Bf'(x i (kln),k)/©x) A(k) 

+ [ I - H 1 (k)R =1 (k)H(k)C i (k) ] H» (k)R” 1 (k) [ z(k)~H(k)x i+1 (k |k-1 1 

(2,122) 

By repeated use of matrix identities this equation may be written in the 
following form corresponding to (2 .57); 
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X(k-1 ) = { I - H* (k) [ H(k)R(k)H» (k) + R(k)] ” 1 H(k)P(k) ] 

O f»(^(kln),k)/3x) 2.(k) + H'(k) [ HtkJP^kjH’ (k) + R(k)J " 1 

[z(k) - H(k)x(k|k~1 )] (2.123) 

Combining (2« 1 1 6) and (2.120) and using the matrix identity (A.2) yields 

-i+1 (k +1 1 k) = ffe^klnj.k) + ( s £(£i(k|n)»k)/c)x) - 2j.(k|n)] 

+ (df( 3^( k ln),k)/©x) P ± (k) H'(k) [ H(k)P(k)H> (k) + R(k)] " 1 

[ z(k) - H(k)x i+1 (k|k-1 )] (2.124) 

We may also write the following equation for P^(k) by combining (2.117) 
and (2.121) and using (2.58)5 

4 (k+1) = (0f(x ± (kln)»k)/3x)[ P ± (k) - P ± (k)H'(k) [ H(k)P(k)H» (k) + R(k)] ^ 

H(k)P(k)j (Sf*( 4(k|n) s k)/0x) + G(k)£(k)G« (k) (2.125) 

We may summarize the method of obtaining the i+1 approximation 
sequence [ x^ + .j (k|n) ] from the previous approximation sequence { x^(kln) } 
in the following three~step procedure. 

1. Calculate the sequences £ x^ + .j (k I k-1 )} and P^(k) for 
using (2.124) and (2.12 5) respectively. 

The procedure begins by setting x^ (o|»l) and 4(°) 
respectively equal to m and P(o) of the a priori dis- 
tribution for x(o). 

. Calculate the sequence { X(k)j for k= ~1„...,n by back- 
ward recursion of (2.123) using the boundary condition 



2 



3. Calculate the i+1 approximation sequence | (k| n) j 

for k=o,...,n using (2.115). 

The same technique may be used when R(k) is singular if the inverses 
appearing in (2.123). (2.124) and (2,125) a re replaced by generalized in- 
verses. 

The method may easily be extended to include the complications of 
nonlinear observations, correlation between the random input w and the 
measurement noise v, and the random input w entering nonlinearly. 
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CHAPTER III 



ESTIMATION OF STATE VARIABLES FOR CONTINUOUS-TIME SYSTEMS 
3,1 Introduction 

The mathematical model for the continuous-time estimation problem 
was discussed in some detail in Chapter I, The explicit problem state- 
ment was given in section 1 „4l . 

The continuous-time estimation problem is analogous in many respects 
to the discrete- time problem of Chapter II. In the continuous- time prob- 
lem we work with probability density functionals instead of probability 
density functions. Specifically, we consider 



the probability density functional for the time segment of x(t) on the 



An introductory discussion of probability density functionals is 
given in Appendix E. For the purpose of this chapter it will be suffi- 
cient to note that the summations appearing in the discrete- time problem 
become integrals in the continuous-time problem. 

3.2 Preliminary Considerations 

We assign a Gaussian a priori distribution for the initial state 
x(t Q ) with mean x(t Q | t Q ) and covariance matrix P(t Q ). It will be con- 
venient to assume that P(t Q ) is positive definite, although this restric- 
tions may be relaxed. 




interval [t Q , t + t] given the observation z^(t) on the interval [t Q ,t], 



We shall use t to designate the present time, t Q to designate 
the time observations begin, and ?' as a general time parameter. The 



estimate of x(r) given z ^ is denoted by x(Y|t). 

In order to estimate x^ ^ when G(t)£(t)G' (t) is non-singular 
we may minimize with respect to x ^ the following functional which 
is the continuous-time analogue of (2.8); 



J(t) = i||x(t 0 )-x(t 0 |t 0 ) || « 

Z ( w 




2 

z(r)-h(x(Y),Y) II . 

R" (f) 



. 2 

+ ||x(r)-f(x(r),Y) II 

[22G'] 



dr 



(3.1 ) 



Equivalently, we may consider minimizing the following functional; 



Kt) =i«x(t o )4(t o |t o ) | 

Z (t o ' 



t 2 2 

+ i f | II z(r)“h(x(r),r) || ^ +||w(r) || . } dr 

t' 1 r (r) £ (r) J 



(3.2) 



with respect to w r . , n and x r . . subject to the constraint 

“ro*^J -IV*] 

x(r) = f(x(Y),r) + G(t)w(r) (3.3) 

This functional, (3.2) subject to the constraint (3.3)* may also be min- 
imized when G(t)£(t)G‘ (t) is singular. 

If we wish to estimate x^ t + T] (3*2) becomes 
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I(t,t + T) = i ||x(t )-x(t | II 

Z (t c ) 



+ i 
^ 2 



/ { II z(r)-h(x(y),7) II ! +11 w(r) || « } <n 

,{ 1 r (r) q er)j 



't + T 2 

+ * i II w(-r) || _ 1 df 






ST'<D 



(3A) 



That is. 



I(t,t + T) = I(t) + i f 



t + T 2 

II w(r)|| « 

t' a (y) 



dY 



(3.5) 



From (3.5) it is clear that I(t,t + T) may be minimized by first min- 
imizing I(t) and then setting w(t’) equal to zero for t - f - t + T. 
This, together with the constraint (3.3), shows that predictions are 
again extrapolations of present estimates. We may, therefore, confine 
our attention to estimating x ^ ^ j . 

3.3 Dynamic Programming Formulation 

Note that because of the constraint (3.3) » minimizing (3.2) with 

respect to w r , and x r . .-i is equivalent to first minimizing with 

\y 0 > L?o»^J 

respect to w ^ and then minimizing with respect to x(t). This leads 
us to define the following "cost* 1 function j 



V(x(t),t) = Min ||x(t )-x(t 1 1 ) || -1 

wrx xi I P (t Q ) 



2 [t 0 ,t] 



i f «£(Y)=h(x(Y),Y) || + 1| w(Y ) || « 

K L R (Y) Q 



2 

Sf'OO 



dY 



(3.6) 
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subject to the constraint (3.3). 

Note that minimizing with respect to wr. .-i leaves a boundary 

"1*0 »“J 

condition on x to be specified. We satisfy this boundary condition by 
specifying the final point x(t). This constitutes a departure from most 
dynamic programming problems in which the initial condition x(t Q ) is 
specified. Our procedure here is similar to that used in case i of 
section 2. -5, in which f”"* existed. No similar assumption need be made 
here because integration of a differential equation may proceed in either 
direction, whereas there is a definite direction associated with a differ- 
ence equation. 

We may give V(x(t),t) an interpretation similar to the one given 
to V n (x(n)) in the discrete-time problem. Let c be any particular value 
of x(t). Then V(c,t) is a measure of the unlikeliness of the most probable 
state trajectory x, , in which x(t) takes on the particular value c, 

— |to«tj 

given the observation z, and the a priori distribution for x(t ). 

[t 0 i>tj 

The estimate of x(t) is that value of x(t) for which V(x(t),t) is minimum. 

Using the separability property we may rewrite (3.6) in the follow- 
ing form; 



V(x(t),t) = Min 

^[E- a t,t] [ 



J V(x(t)- A X,t- At ) + J" [|] z(f)-h(x(^),r) || 



R 



+ II w(f ) II 

if 1 (*) J 



dr 



subject to the constraint (3.3). Expanding V(x(t),t) in a Taylor series, 
letting at approach zero and using the constraint (3.3), we obtain the 
following equation; 



V, = Min 
w(t) 



2 

V' [ f(x(t),t) + G(t)w(t)] + i || z(t)-h(x(t),t) || . 

^ R 00 



+ il|w(t) 



2 00 



(3.7) 



The procedure used to obtain (3.7) is standard in the dynamic 
programming literature. For more detailed discussions of this procedure 
and its relation to the calculus of variations see [3, 5» 1 5» 16 ], 

The w(t) which minimizes (3.7) is easily found by differentiation 

to be 



w(t) = &(t)G'(t)V x (3.8) 

Substituting this expression for w(t) into (3.7) yields the following 
important relation; 



V 

t 



T V' G(t)£(t)G«(t)V 

X 



2 

V* f(x(t),t) + \ 1| z(t)-h(x(t),t) || 1 , 

2 R (t) 

(3.9) 



At this point our problem is twofold; 
i) Find a function V(x(t),t) which satisfies (3.9) » 

i 

ii) Find x(t |t).„ ttye value of x(t) for which V(x(t),t) is minimum. 
A boundary condition for (3.9) is given by the a priori distribution for 
the initial state x(t Q ), since for t = t Q , (3.6) becomes 



V(x(t c ),t o ) = A Hx(t o )»x(t 0 lt 0 ) II 



P (t ) 
— <r 



(3.io) 



Once V(x(t),t) has been found, we may obtain x(Y |t) for t Q -T - t 
by backward integration of the following equation formed by combining 
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(3.3) and (3.8); 



ax( 7 '|t)/c> 7 ' = f(x(r|t),r) + (3.11) 

using x(t |t) as the boundary condition. 

There is no difficulty in including in the dynamic programming 
formulation systems in which the random input w enters nonlinearly. For, 
example, if the basic system is described by the following equation; 

x(t) = f(x(t),t) + G(x(t),t)w(t) (3.12) 

then (3.8) becomes 

w(t) = ^(t)G»(x(t),t)V^ (3.13) 

and (3.9) becomes 

V t = - J 4 G(x(t),t)2(t)G'(x(t),t)Vx - v' x f(x(t),t) 

2 

+ i ||z(t)-h(x(t),t)|| 

5" (t) (3.14) 

For well defined systems described by the following equation; 

x = f(x(t),w(t),t) (3.15) 

(3.7) becomes 

f , 2 

V t = Min J - V f(x(t),w(t),t) + f || z(t)-h(x(t),t) |[ 

w(t)[ - R“ (t) 

2 1 

+ i II w(t) II t 

S =1 (t) (3.16) 
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and (3.8) becomes 



w(t) =2(t)(3f'(x(t),w(t),t)/a w)V x (3.17) 

Equation (3.16) is equivalent to two equations: (3.17) and the 
following equation; 



, 2 
V . = - V f(x(t),w(t),t) + i II z(t)-h(x(t),t) || 1 

t i R" 1 (t) 

+ i II w(t) || 

2” 1 (O 



( 3 . 18 ) 



We shall later relate (3.15) through (3.18) to the two- point 
boundary value problem obtained by Bryson and Frazier [it)]. 

3.4 Solution to the Linear Problem 

The solution of the linear filtering problem discussed in section 
1 .44 was first obtained by Kalman and Bucy [ 34] . In this section we give 
a simplified derivation of this solution based on the dynamic programming 
formulation of the problem. In order to simplify the resulting equations 
it will be convenient to omit the parameter t when no confusion is likely 
to result. 

When the basic system is linear (3.9) becomes 

. , 2 

V t = - f V G(t)£(t)G»(t) V - V F(t)x(t) + ill z(t)-H(t)x(t) || 

£ - - R (t) 

(3.19) 

It is well known [44, 45] that an analytic solution may be obtained for 
equations of this type by expressing V(x(t),t) in a series of the follow- 
ing form; 
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V(x(t),t) = a(t) + b' (t)x(t) + i x> (t)C(t)x(t) 

For our purpose it will be more convenient to write this expression in 
the following symmetric form in which the minimum of the function is 
shown explicitly; 

2 

V(x(t)„t) = j II x(t)-x(t|t) || 1 + k(t) (3.20) 

If (t) 



In order to determine x(t|t), P” 1 (t) and k(t) we substitute V 
and V t obtained from (3.20) into (3.19) and equate terms of like degree 
in x(t). 

Differentiating (3.20) we obtain the following equations; 



V = P” 1 (t) [ x(t)-x(t|t)J 

A 



( 3 . 21 ) 



= j x 1 p“ x 



X* 



+ x» 



9 ^ 

P 



x + d(i x« P -1 x + k)/dt (3.22) 



Substituting (3.21) and (3.22) into (3.19) and equating terms of 
like degree in x, we obtain the following equations after some algebraic 
manipulations ; 

if 1 = - P" 1 GQG» P” 1 - P^F - F'P” 1 + jH * R” 1 H (3.23) 

dx(t |t)/dt = F(t)x(t|t) + P(t)H'(t)R" 1 (t) [ z(t)-H(t)x(t|t) J (3.24) 

o o 

Using the identity P = - P P~ P we may write (3.23) in the following form; 

P(t) = F(t)P(t) + P(t)F'(t) - P(t)H» (tjR’ 1 (t)H^t)P(t) 

+ G(t)$(t)G'(t) (3.25) 



The final form of the optimal linear filter is given by (3.24) 




Fig, 10 Optimal linear estimator 



and (3*25) which are identical to the equations obtained by Kalman and 
Bucy. A block diagram of the optimal estimator is given in Fig. 10. Here 
we again find the intuitively appealing feature of a model of the system 
appearing in a feedback loop. 

As in the discrete-time case, P(t) is the covariance matrix of the 

probability distribution for x(t), given the observations. Equations (3.24) 

and (3.25) remain valid when P(t Q ) is singular as can be shown by taking 

the limit of the discrete-time solution. If P(t ) is singular the dynamic 

— o 

programming formulation breaks down because trajectories passing through 

points which are inconsistent with prior information become infinitely 

more unlikely than trajectories which do not, and we are unable to reflect 

this situation in a scalar "cost" function. This is not a serious drawback 

since the a priori distribution may be chosen to approximate the state of 
"" j 

prior knowledge arbitrarily closely without introducing a singular co- 
variance matrix. As in the discrete- time case, this difficulty is largely 
bypassed if GQG * is non-singular , since then P(t Q + A t) is positive def- 
inite. 



3.5 Approximate Solution to the Nonlinear Problem 



In order to develop an approximate solution to the nonlinear prob- 
lem we follow a procedure similar to the one used in discrete- time. We 
may rewrite (3.9) in the following form; 




il|G'(t)v x \ 

- a(t) 



V x £(*(t)»t) + T II z(t)=h(x*(t|t),t) 



* 2 

~ h(x(t),t)-h(x (t|t),t) || 

r 1 (t) 



( 3 . 26 ) 



Since the estimate of x(t) is the value of x(t) for which V(x(t),t) 
is minimum, we would expect that V(x(t),t) could be closely approximated 
in a neighborhood of this minimum by a quadratic form in x„ Such a 
quadratic form has already been shown to be the solution in the linear 
case. We shall use x (t|t) to designate the estimate produced by this 
approximation. The terms on the right hand side of (3.26) involving f 
and h will be expanded in a Taylor series about the point x*(t|t). The 
argument t should be understood but will usually not be written. All 
partial derivatives are to be evaluated at the point (x*(t|t),t). 

The approximation for f is simply 

f(x(t),t) * f(x*) + (-Sf/ax) [ x-x* J (3.2?) 

The term involving h may be broken down into the following sum of three 
terms; 

2 ll£-h(x*) || + | ||h(x)~h(x*) || - [h' (x)-h» (x*)] If 1 [z-h(x*)] 

£ IT 

The first of these terms is independent of x. The second term is approx- 
imated as follows; 

2 

j l|h(x)-h(x*) II j [x-x* J* [ 3 [( 3 h * / 0 x)R“ 1 h] /a xj [x-x*| 

(3.28) 

The approximation for the third term is 

" [£' (**)] 2” 1 f £~h(x*)J si - [ x°£* ] ' ( Sh'/ ax) R" 1 [ z-h(x*)] 

(3.29) 

If these approximations are substituted into (3.2 6) the following equation 
results; 



V t = - ill G«V X || - V ; { f(x*) + ( a f/a x) [ x-x*J ] 



+ il|2-h(x*) || _ 1 +i [x-x*] 1 { 3 [(ah'/ a xjR" 1 ^] /ax] f £”£*] 

-[£“£*]' ( ah*/ axjR" 1 [z~h(x*)] (3.30) 

An analytic solution of (3.30) may be obtained by assuming that V(x(t),t) 
is quadratic in x„ Specifically, 

. * 2 

V(x(t),t) = ■§ l| x(t)=x (tit) || ^ 1 + k(t) (3.31) 

P (t) 



It is straightforward, but algebraically involved, to substitute 

into (3 o30) the V and V obtained from (3.31 ) to obtain the following 
X x> 

equations which describe the estimator; 

dx*(t|t)/dt = f (x (t|t),t) + P*(t)(s h* (x*(tl tJjtJ/ax)!" 1 (t) 

[z(t)-h(x*(t|t),t)] (3.32) 

P*(t) = (af/ax)p*(t) + P*(t)(af»/3x) + G(t)fi(t)G'(t) 

-P*(t) { a[( 3h c /ax)R a1 h ] /a xj p*(t) (3.33) 

The approximate solution to the problem of estimating the state 

variables of a nonlinear system in real time is given by (3.32) and 

(3.33). For linear systems these equations reduce to those obtained by 

# * 

Kalman and Buoy* The initial conditions for P (t) and x (t|t) are the 
P(t Q ) and x(t 0 |t 0 ) given by the a priori distribution for the initial 
state x(t Q ). 

A block diagram of the nonlinear estimator in a control system 
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Fig. 11 Nonlinear estimator in a control system 



88 . 



is given in Fig. 11. In the diagram the control vector u is shown ex. 

plicitly. Again we find the intuitively appealing design which places 

a model of the basic system in a feedback loop. 

If the random input w enters the system nonlinearly we may retain 

* 

the basic structure of the estimator if we replace G(t) by G(x (tlt) 0 t) 
or (Bf(x (t |t),0t)/a w) -» whichever is appropriate. 

A first approximation to the solution of the problem of estimating 

j|( 

x(f) for t - / - t is given by x (T* |t) which can be obtained by backward 

$ 

integration of the following equation using x (tit) as the boundary 
condition; 



= f(x*(Ylt),'0 + G('l*)fi(r)G»(Y)V x 



A Two. Point Boundary Value Problem 



( 3 . 3*0 



The two. point boundary value problem obtained by Bryson and 
Frazier [lo] using the calculus of variations follows directly from the 
dynamic programming point of view. We shall consider the most general 
case in which the noise enters nonlinearly as described by ( 3 . 15 ) through 
(3.18). The procedure we follow is based on the one used by Dreyfus [l6] 
in his study of the problem of Mayer in the calculus of variations. 

In order to simplify the resulting expressions the arguments x, 
w and t usually will be omitted when no confusion is likely to result. 

The following relation follows from the rules of differentiation; 



dV/dt = 3V /at + ( 5 V / c>x) x 

x' x x' - - 



( 3 . 35 ) 



Differentiating (3.18) with respect to x yields 



= 89 = 



(3.36) 



aV x /at = -.(sf'/sx)V x - (©V x /ax)x 

-(sh'/ax) R” 1 [ z,"h(x,t)] 

"(3w'/3x) [ (3f'/ 5 Ii) V x « fV] 

Noting that by (3.17) the coefficient of (3w'/sx) in (3.36) is zero, 
we may combine (3.35) and (3.36) to obtain the following equation; 

dV x /dt = - (3f'/3£)V x - (^'/a^r 1 [z-h(x,t)] (3.37) 

This equation, together with (3.15) and (3 • 1 7 ) * are the basic relations 
for the two-point boundary value problem,, At the final time t we have 
the following natural boundary condition which corresponds to the 
requirement that x(t|t) is the value of x(t) for which V(x(t),t) is 
minimum; 

V (t) = 0 (3.38) 

X 

The boundary condition at the initial time t is obtained by differentia- 
ting (3.10). This yields 

x(tjt) = x(t Q |t Q ) + P(t 0 )V x (3.39) 

The quantity V plays the foie of the Lagrange multiplier of the 
calculus of variations [15, 16]. If we let V x = \(t) , then the two- point 
boundary value problem may be written in the following form which closely 
resembles that obtained by Bryson and Frazier; 

3x(Ylt)/3f = f(x(f|t),w(7'),y) t 0 ~ r -t (3.40) 

w(t ) = ^)( 3 f'/ 5 w) X(r) t 0 ^ r ^ t (3.41) 
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X(f) = - X(r) - (2>h'/^x) Jf 1 ^) tz(T') - h(x('/|t),r)] 

t 0 ^r 4 t (3.42) 

The boundary conditions are 

X(t) = o (3>3) 

x(t It) = x(t It ) + P(t ) X(t ) (3.44) 

0 Q U O U 

For the important special cases in which the random input w enters 
linearly, the equations for x(T'lt) and \(i ) become 

3 x(f lt)/e>f = f(x(rlt),r) + G(T)fi(r)G»(^) X(T) (3.45) 

xor) = - (af'/s x) \(r) - (sh'/^x) r” 1 (r) [ z(?) - h(x(rit),r) ] 

(3.46) 

We shall have more to say about the two-point boundary value 
problem later when we discuss the relationship between the estimation 
problem and control problems „ 

3.7 Continuous-Tine Systems with Sampled Observations 

In certain applications observations of a continuous-time system 
can conveniently be made at discrete points in time,, We shall discuss 
this situation only briefly, since it requires only minor modifications 
of our previous results. 

Consider the system 

x(t) = f(x(t),t) + G(t)w(t) (3.47) 

Suppose the observations are made at equally spaced intervals of length 
T. The observed signal is 
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z(kT) = h(x(kT),kT) + v(kT) 



(3.48) 



We again assume that the random input w(t) and the measurement noise v(kT) 
are Gaussian with zero mean. The covariance matrices for v and w are as 
follows ; 

E [ w(t)w 5 (7 1 ) ] = fi(t) d" (t-7) 

E [v(kT)v«(jT)] = R(kT) S. k 

E [w(t)v'(kT)] = 0 

If we are between observations n and n+1 (nT - t < (n+1 ) T) , then by 
analogy to previous work, we may minimize the following functional; 



I(t) 



i ||x(o)”£(o |o“) 





2 

illw(v') II •) d T 
fl (V) 



n 2 

+ \ II z(kT)~h(x(kT),kT) || 1 nT - t < (n+1 )T 

k=o “ R" (kT) 

( 3 . 49 ) 

subject to the constraint (3o47)„ 

Here we have assumed for convenience that observations begin at 
t=o e Note that because of the sampled nature of the observations , the 
estimate will be discontinuous at the sampling times* We use x(o |o“) to 
designate the mean of the a priori distribution for the initial state „ 

This corresponds to m of the discrete-time problem. Similarly, x(o|o + ) 
would correspond to x(o|o) of the discrete-time problem. 

Using a standard engineering technique, we let i(t) be a train 
of unit impulses with the impulses occurring at the sampling times kT. 

Then (3.^9) may be rewritten in the following form which closely resembles 
(3.2); 
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R~ 1 (y) 



(3.50) 



subject to the constrant (3 .47). Proceeding formally 



V t = - i v x v x * \ f(x(t),t) + | i(t) I! z(t)-h(x(t),t) || _ 1 



fi“ (t) 



(3.51 ) 



Note that is discontinuous at the sampling times so that the limit 
used to obtain (3.51 ) by analogy to (3.7) does not exist at the sampling 
instants. Proceeding in a purely formal manner, we now use the same 
approximation technique that was employed in section 3.5 to obtain the 
following equations; 



dx*(t lt)/dt = f (x*(tlt),t) + i(t) P*(t + ) ( 3 h s / cSx)#” 1 [ z,(t)-h(x*(t 1 1) „t)J 



P*(t) = (3f/3x) P*(t) + P*(t) (3f«/3x) + G(t)£(t)G' (t) 

for kT < t < (k+1 )T (3.54) 



At the sampling instants we have the following formula resembling the 
results of the discrete-time problem; 



P*(kT + ) = [i + P(kT“) { 3[(3h»/S>x) R =1 h J/^x]]" 1 P(kT) (3.55) 



(3.52) 



P*” 1 = - P*" 1 (^ f/3 x) - (3f'/2x) P* =1 - P*"* 1 GQG 1 P*" 1 , 




(3.53) 



• jjc^ ° ♦ 1 % 

At the sampling instants the identity P = - P P ° P breaks down. 
Between the sampling times we have 



Note that the values of P*(kT + ) and ("3 h /3x) are only 
needed at the sampling times. The estimator could, therefore 
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consist of an analogue model of the basic system in conjunction with a 
special purpose digital computer,, Such a hybrid scheme would simultaneously 
avoid the difficulties of approximating the nonlinear system on a digital 
computer and performing a large number of multiplications on an analogue 
computer. 

3.8 Relation Between the Estimation Problem and Control Problems 

The fact that the solution of the linear filtering problem 
is formally the same as the solution of the problem of controlling 
a linear system with quadratic performance criteria was first pointed 
out by Kalman [31] and the existence of this duality is by now well 
known. Since this duality has received ample coverage in the literature 
[25, 29 , 31, 32 o 33 o 34] we shall not dwell on this point except to 
mention that the nature of this duality is made especially clear by the 
present formulation of the estimation problem in which the estimation 
procedure involves the minimization of a quadratic "cost" functional. 

The close connection between the observability of a system and 
the controllability of the adjoint system will be discussed in Chapter IV. 

It should be clear from the present formulation that there is a 
close connection between the nonlinear estimation problem and certain 
nonlinear control problems. For example,, consider the following system; 

x(t) = f(x(t),t) + G(t)u(t) 

£(t) = h(x(t),t) 

Suppose that we want the output y(t) to follow a known function ,z(t) 

and agree to choose the control u to minimize the following per- 

L^o » ^ J 

formance functional; 
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1 

2 




z(t) - h(x(t),t) 



2 

i + 

R \t) 



II li(t) 



II 



2 -i 

2" 1 (t)- 



dt 



This functional is a weighted combination of a quadratic measure of the 
error and the total "energy" expended „ 

Various methods may be used to attack this problem. We shall 
find it convenient to use the Hamiltonian point of view, so much in vogue 
in the current control literature. Since this section is an aside, we 
shall assume that the reader is familiar with the basic concepts. For 
extensive treatment of these and related matters, the reader is referred 
to [3, 5, 15, 16, 30, 32, 55, 6l]. 

Let us define the following Lagrangian; 

2 2 

L(x(t),u(t),t) = \ | a(t) - h(x(t),t) II + || u(t) || 

R- 1 (t) 2 _1 (t) 



Using the "Maximum Principle", we consider the following Hamiltonian j 

H(x(t),p(t),t) = Max 
u(t) 

The maximizing value of u(t) is easily obtained by differentiation 
u*(t) = fi(t)G'(t)£(t) 

Substituting this value into the expression for H yields 

2 2 

H(x(t),2(t),t) = £' (t)f(x(t),t) + j || G' (t)£(t)|| - £ || z(t)-h(x(t),t) || 1 

fi S (t) 

The canonical equations for this problem, 

x(t) = dH/ 

£(t) = - sh/ a x 



£‘ (t) [ f(x(t),t) +G(t)u(t)] - L(x(t),u(t),t)| 
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are the same as (3.45) and (3.46) if we equate jd( 4) and i(t). Further, 

we can easily recognize (3® 9) as the Hamilton- Jacobi partial differential 

) . 

equation [15, 16, 30j » 

V t + H(x(t),V x ,t) = o 

We can make the boundary conditions the same as those for the estimation 

problem by leaving the final state x(T) unspecified and by assigning a 

for starting in a state other 

o> 

than a nominal initial state x (t Q ). Specifying the initial state would 
correspond to the smoothing problem in which the initial state were known 
exactly; that is, P(t Q ) = 0. 

The regulator problem z(t) = o corresponds to a smoothing problem 
in which the observed value of signal plus noise is equal to zero on 
[ t Q , T] „ While such a set of observations is extremely unlikely, it is, 
nevertheless, the most likely curve jj for a stable system with the 

mean of the a priori distribution for x(t Q ) equal to zero® 

While this formulation of the control problem is somewhat arti- 
ficial since, except for the regulator problem, the desired output is 
usually not known in advance, the point we wish to make is that the 
problem of finding the minimum energy control so that the output of the 
system follows a desired trajectory using a quadratic performance criteria 
is mathematically identical with the problem of finding the most likely 
random input after making noisy observations of the output. 



quadratic "cost" ||x(t Q ) - x*(t 0 ) | 



r 1 (t 



CHAPTER IV 



OBSERVABILITY AND CONTROLLABILITY 
4 „ 1 Introduction 

The purpose of this chapter is to give a brief introduction to the 
concepts of observability and controllability and to relate these ideas 
to the present formulation of the estimation problem. These concepts 
were introduced by Kalman [32, 34] and have been studied by him [33] and 
his associates [ 70] . The paper by Bertram and Sarachik represents an 
early contribution^ [69] . 

We shall confine our attention to discrete- time systems since 
certain problems arise in discrete-time systems which do not exist in 
their continuous-time counterparts and since the results are easily ex- 
tended to continuous-time systems which have been studied more extensively 
in the recent literature [33 0 70]. 

The definitions of controllability and observability which we 
shall introduce will differ slightly from those given elsewhere. The 
modified definitions will enable us to include linear systems with singular 
transition matrices and — in principle — nonlinear systems. They will 
also simplify the development of the relation between observability of 
a system and controllability of the adjoint system which is of particular 
significance in the estimation problem. 

The material in section B 6 of Appendix B will be continually re- 
ferred to and provides necessary background for this chapter. 

4.2 Preliminary Considerations 

Consider the following linear discrete-time system; 
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x(k+1 ) = F(k)x(k) + G(k)u(k) (4.1 ) 

^(k) = H(k)x(k) (4.2) 

In section 1 „3 we showed that the forward transition matrix for this 
system may be expressed in the following form; 

I(m+1„k+l) = F(m)F(m-1 ) • • • F(k+1 ) m > k (4.3) 

The adjoint system of (4,1) and (4,2) is defined to be 

x(k-1 ) = F* (k)x(k) + H> (k)u(k) (4,4) 

Z(k) = G» (k)x(k) (4,5) 

The backward transition matrix for the adjoint system is 

5(k,m) = F'(k+1) ° 9 • F' (m-1 )F' (m) m > k (4,6) 

Comparing (4,3) and (4,6) we obtain the following important identity; 

5(m+1 »k+1 ) = (k„m) (4,7) 



The relation between a system and its adjoint will be discussed in some 
detail in the sequel, 

4,3 Definition of Observability 

The question of observability has to do with the basic structure 
of the system under consideration. Roughly, the question is this; "Given 
a finite sequence of exact measurements of the output of an unforced system, 
can one uniquely reconstruct the corresponding state sequence?" If the 
answer is yes, the system will be called completely observable on the 
sequence of times during which the measurements are made. 



In order to develop some insight we shall first examine the 
question of observability for a time-invariant discrete-time system 
before giving a more precise definition of observability and considering 
its ramifications. Consider the following system; 



x(k+1 ) = F x(k) + G u(k) 


(4.8) 


Z(k) = H x(k) 


(4.9) 



with the state vector x, and n- vector, and the output If the system 
is unforced^ u_ = o. Suppose that we make n consecutive measurements of 
the output £ 0 For convenience we assume that the first measurement 
occurs at time zero, dearly, this involves no loss of generality in the 
time-invariant case. We may then write the following equations relating 
the output sequence [^(o) , . o . ,£(n-1 ) ] and the initial state x(o); 

Z(o) = H x(o) 
y(1 ) = H F x(o) 



y(n=1 ) = H F^ 1 x(o) 
or, equivalently 



Z(o) 




H 


Z(l) 

o 


s 


H F 

o 


o 

o 

. z(n~1 ) 




o 

0 

H F 1 ^ 1 



x(o) 



(4.10) 



We wish to know whether or not the initial state x(o) can be determined 
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uniquely as a result of the measurements of the output. Once the initial 
state x(o) is known, the sequence of states [ x(k)^ (for k - o) of the un- 
forced system is given by (^.8), We implicitly assume that (4.10) has a 
solution; that is the output sequence does not contradict the model of 
(4.8) and (4,9), From elementary matrix theory we know that x(o) can be 
determined uniquely if and only if the matrix on the right side of (4,10) 
or, equivalently, its transpose [H 1 , F'H ! ,...,F' ” H' ] has rank n. More- 
over, if the initial state of the time-invariant system cannot be deter- 
mined after n observations of the output, additional observations will 
convey no new information because, by the Cayley-Hamilton theorem [ 1 7 ] , 
any positive integer power of an n by n matrix F may be expressed as a 
linear combination of I, F,.,o, F 11 "^ and, therefore, additional observations 
are simply linear combinations of the first n observations. 

In devising a workable definition of observability for discrete- 
time systems we are faced with a problem that does not arise in continuous- 
time systems, namely that a discrete-time system may be described by a 
forward difference equation, for example (4,8); or by a backward differ- 
ence equation, for example (2.34), 

Consider the forward system 

x(k+1 ) = f (x(k),k) (4.11) 

and the backward system 

x(k-1 ) = f(x(k),k) (4.12) 

the output of both systems being given by an equation of the following form; 



£(k) = h(x(k),k) 



(*M3) 



In order to include both types of systems, we propose the following 
definition of observability. 

Definition ; A discrete-time system will be called completely observable 
on a set of times { j, j +1 , ...,m ~1 ,m] if, by observing the output sequence 

» one can uniquely determine the state sequence [x( j) , . . . ,x(m)j 
for all possible state sequences. 

It is clear that for the forward system (4.11) and (4.13) it is 
necessary and sufficient to determine x(j) 3 nd for the backward system 
(4.12) and (4.13) it is necessary and sufficient to determine x(m). 

4.4 Definition of Controllability 

The question of controllability, like the question of observability, 
has to do with the basic structure of the system under consideration. 

Roughly, the question is this ; "Given any two states Xj and x^, can one 
choose a finite input sequence in such a manner that the corresponding 
state sequence will be of the form ^x-j ,... 5 X 2 ]?" If the answer is yes, 
the system will be called completely controllable on the set of times 
during which the state sequence occurs. 

In order to develop some insight we shall again consider the 
question of controllability for a time -invariant discrete-time system 
before giving a more precise definition of controllability. Consider 
the following system; 

x(k+1 ) = F x(k) + G u(k) (4.14) 

with the state vector x, an n- vector, and the control input, u. Suppose 
that we wish to move the system from a known initial state x(o) to a 
desired state x^ at time n. We may then write the following equations 
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relating the state sequence and the control sequence; 

x(1 ) = F x(o) + G u(o) 

x(2) = F 2 x(o) + F G u(o) + G u(1 ) 



o 



x^(n) = F 11 x(o) + F 0 ^ G u(o) + • « ® + G u(n-1 ) 



(4.15) 



We may rewrite this last equation in the following form; 



3^(n) - f 1 x(o) = [ if” 1 G,„ tl FG, G ] 



u(o) 



u(n-1 ) 



(4.16) 



We see that a solution of (4.1 6) vail exist if and only if 2L,(n) - F 11 x(o) 

- d — — 

is a linear combination of the column vectors of the matrix [ F 11 " 1 G „ . . . , F G „G] 
in which case x^(n) „ x(o) belongs to the range of that matrix. If. the 
matrix [F n “'* G„ „ . . , F G„G ] has rank n, then any final state may be reached 
from any initial state in n transitions. Moreover, by the Cayley-Hamilton 
theorem, any state which can be reached from x(o) can be reached in (at 
most) n transitions. For the special case of a single input system, the 
matrix G becomes a vector g and we obtain the results that every final 
state may be reached from every initial state if and only if the matrix 
[p n ”1 g tcoCf F £, g_] is non-singular and that any state which can be 
reached, can be reached in (at most) n transitions. These results are well 
known [32, 69] and are the basis for the design of "dead beat" systems. 

In devising a workable definition of controllability we are again 
faced with the problem of forward and backward difference equations. Consider 
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the forward system 5 



x(k+ 1 ) = f(x(k)s> u(k),k) (4,17) 

and the backward system 5 

x(k» 1 ) = f(x(k), u(k),k) (4.18) 

In order to Include both types of systems,, we propose the following defini- 
tion of controllability,, 

Definition ; A discrete- time system will be called completley controllable 
on a set of times [ j „ j+1 ra-1 „mj if for every x^ and x^ a control 

sequence [ u(j ),„,,, u(m) ] exists such that x(j) = x.j and x(m) = x^. 

It is clear that for the forward system (4,17) only the sequence 
{ u(j ) , 000 i>]i(m- 1 ) 1 ^- s pertinent and for the backward system (4,18) only 
the sequence { u( j +1 ) „ . , . ,u(m) j is pertinent, 

4,5 Controllability and Observability for Linear Discrete-Time Systems 

The questions of observability and controllability for time-varying 
linear discrete- time systems are closely related to those for the time- 
invariant systems already discussed. Let us consider the system of (4,1) 
and (4,2) which we rewrite for convenience 

x(k+ 1 ) = F(k)x(k) + G(k)u(k) (4,1 ) 

£(k) = H(k)x(k) (4,2) 

If we observe the output sequence [ ^(j) s ,,,,2(”i)l for the unforced system, 
u e o, then we may write the following equation relating the output sequence 
(Z(j)i.«<>. 0 £(m) j to the initial state x(j); 
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(*. 19 ) 



z(j) 




' H(j) 


z( j+1 ) 
0 


— 


H(j+1) i(j+1,j) 
0 


e 

o 

z(*0 




0 

0 

H(Vm) I(m,j) 



Since this is just a special case of the equation Ax = £ which is con- 
sidered in Appendix B, we may apply lemma B6 0 1 to obtain the following 
result,, 

Lemma 4.51 s The system (4.1 ) and (4.2) is completely observable on the 
set of times { j, j+1 „ . . . 0 m-1 ,m j if and only if the following matrix is 
positive definite; 

m 

M f (j,m) = I” ( k „j)H ! (k)H(k)i(k,j) 

k=j 

If M^(j,m) is positive definite, then the unique solution for x(j) is 

m 

x(j) = (j»m) I'(k,j)H ! (k)^(k) 

k=j 



o o o 

From (4.21 ) we see that it is not necessary to know the sequence 
merely the following sum; 



IZ I'CkJ)!* (k)^(k) 
k=j 



Note that if M^(j,m) is positive definite, then so is M^(j,m+k) for k > 
Let us now consider the question of controllability of the same 
system, (4.1). If we consider the control sequence [ u(j ), . . „ ,u(m-1 )j , 



(4.20) 



(4.21 ) 



- 104 - 



then we may write the following equation relating the states x(j) and 
x(m) and the control sequence 



x(m) - I(m,j)x(j) 



[ I (m . j+1 )G ( j ) , . • • ,G (m-1 ) ] 



H (j) 



o 

© 

u(m-1 ) 



(4.22) 



Again this is just a special case of the equation Ax = y and we immediately 
obtain the following lemmas as special cases of lemmas B6.2 and b 6„3 re- 
spectively. 



Lemma 4„52 ; The state x(m) i^y be reached at time m from the state x(j) at 
time j if and only if x(m) - I(m 5 j )x( j) belongs to the range of the follow- 
ing matrix | 



m-1 

W„(j,m) = 'V' I(m 0 k+1 )G(k)G* (k)l' (m 9 k+1 ) 

K=j _ “ (4.23) 



In this case, the solution which minimizes 



m-1 2 

i l|n( k ) II 

k=j 



u(k) = G ll (k)i , (m 0 k+1 J^Cj.m) [ x(m) - i(m„ j)x(j) J k=j,„..„m-1 (4.24) 



Lemma 4.53 ; The system (4.1 ) is completely controllable on the set of times 
{j» j + 1 too . ,m-1 ,mj if and only if W^Cj^m) is positive definite. In this case 
the solution which minimizes 
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m-1 2 

x: 111*0011 

k=j 

is 

u(k) = G ! (k)l' (m,k+1 )W f _1 (j 5 m) [ x(m) - I(m, j)x(j)] k=j,...,m-1 (4.25) 

© o o 

Let us now consider a system described by a backward difference 
equation. In particular, let us consider the adjoint system given by (4.4) 
and (4.5) which we rewrite for convenience 

x(k~1 ) = F'(k)x(k) + H'(k)u(k) (4.4) 

Z(k) = G> (k)x(k) (4.5) 

In order to examine the question of observability we write the following 
equation relating the output sequence { £( j ),... ,£(m)] and the state x(m) 
for the unforced system; 



1 

O 

C_i. 


= 


G«(j) £(j.m) 

o 


x(m) 


o 

0 

£(m-1 ) 


4** 


o 

o 

G' (m-1 ) f (m-1 t m) 




£(m) 




G* (m) 


(4.26) 



Again we note that (4.26) is just a special case of the equation Ax = y 
and we immediately obtain the following lemma as a special case of B6.1. 

Lemma; 4.54 ; The system (4.4) and (4.5) is completely observable on the 
set of times {j,...,mj if and only if the following matrix is positive 
definite ; 
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(4.2 7) 



m 

M b (j 0 m) = ]T S«(k,m)G(k)G»(k)S(k,m) 
k=j 

If M (j 8 m) is positive definite the unique solution for x(m) is 

“b 

.1 A 

x(m) = M b (j„m) f (k„m)G(k) 2 ;(k) 

o o o 

Using the identity (4.7) we may write (4.27) as follows; 
m 

j ,m) = V |(m+1 ,k+1 )G(k)G« (k)S» (m+1 ,k+1 ) = W_( j ,m+1 ) 
lc=j “ 

A comparison of (4.23) and (4.29) yields the following theorem. 

Theorem 4 ,,55 ; The system (4.1) and (4.2) is completely controllable on 
the set of times { j,...,m+l( if and only if the adjoint system (4.4) and 
(4„5) is completely observable on the set of times { j , . « , t m] . 




Let us now examine the question of controllability of the adjoint 
system. We consider the control sequence [ u(j+1 ),... »u(m)j and obtain 
the following equation relating the states x(j) and x(m) and the control 
sequence ; 



x(j) - I(j»ro)x(m) = [ H' (j+1 )„... } f(j s ,nu.2)H , (m-.1 J^jjm-I )H* (m)] 



u( j+1 ) 



© 



u(m-1 ) 
u(m) 



(4.28) 



(4.29) 



(4.30) 
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Again this is just a special case of the equation Ax = % and we obtain 
the following lemma as a special case of b6„3o 



Lemma 4.56 : The system (4.5) is completely controllable on the set of 

times {j„j+1 „ „ . . t m-1,m}if and only if the following matrix is positive 
definite; 



m 

W.(j,m) = Y I(0.k-1)H'(k)H(k)i*(j t k-l) 

b k^3+1 (4.31) 



If W b (j,ra) is positive definite the solution which minimizes 



m 2 

Y J llu(k) II 

k=j+1 



IS 



u(k) = H(k)S' (j,k-1 )W b ” (j „m) [ x( j) - *( j ,m)x(m)] k=j+1 t „..,m 



(4.32) 



Using the identity (4.7) we may rewrite (4.31) in the following 



form; 



m 

W b ( j »m) = Y 1' ( k »j +1 )H' (k)H(k)$(k, j+1 ) = M f (j+1 ,m) 

k=j+1 (4c33) 

A comparison of (4.20) and (4.33) fields the following theorem. 

Theorem 4,57 ; The system (4.1) and (4.2) is completely observable on 
the set of times { j.,j+1 ... . »m} if and only if the adjoint system (4.5) 
and (4.6) is completely controllable on the set of times j j-1 ,j,...,mj. 



It is interesting to note that the lemmas given elsewhere [70J 
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correspond to the results we have obtained for the backward system. For 

<*•1 

discrete- time systems in which F“ (k) exists and for continuous-time sys- 
tems the distinction between forward and backward equations does not arise. 

In these cases the basic lemmas may be stated in alternate forms corres- 
ponding to those we have obtained for the forward and backward systems. 

We may easily relate theorem 4.57 to the general solution of the 
linear smoothing problem given in Appendix D. Consider the case in which 
the random input w and the measurement noise v are uncorrelated. For 
convenience we rewrite (DA?) in the following form; 

# 

\(k-1) = F 1 (k)_\(k) + H ! (k) [ H(k)P(k)H> (k) + R(k)] 

{ z(k) - H(k) [ x(k I k— 1 ) + P(k)F» (k)_X(k)] j (4.34) 

In addition from (D.48) we obtain 

x(o|n) = m + P(o)^(-1 ) (4.35) 

From (4.35) we see that the estimate of the initial state may take on 
all possible values only if _X(-1 ) can take on all possible values, but 
from (4.34) we see that _X(=1 ) can take on all possible values, only if 
the adjoint system is completely controllable on the set of times 
(-1 ,o,.„. ,nj. The solution of the linear estimation problem given in 
Appendix D is optimal even if the system is not completely observable 
and the optimal procedure is simply to estimate the a priori mean for 
those states which are unobservable. 

If we note that the expression 

z(k) - H(k) [x(k|k-1) + P(k)F'(k)X(k) ] 
appearing in (4.34) is the estimate of the measurement noise v(k) we 
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see that an absurd situation would arise if the system were not com- 
pletely observable and the adjoint system were completely controllable. 

For then we would be using the estimate of the measurement noise to modify 

I 

the a priori estimate of certain state variables when the measurements 
themselves contained no information about those state variables. 

We may also relate theorem 4.55 to a simple control problem. Con- 
sider the problem of moving the system (4.1) from a specified initial 
state x(j) to a desired state x^ in n transitions using the control 
sequence for which the "energy" expended is minimum, "Energy" is defined 
to be 



m-1 2 

S £ ll£( k ) II = j +n 

k =3 



To solve this problem we minimize the following expression; 



m-1 2 

i||u(k) || + VOO [i( k+ 0 - Z(k)x(k) - G(k)u(k)] 

k=o 1 (4.36) 



subject to the constraint 



m— 1 

x,(m) - I(m,j)x(j) = Y2 I(ra, k +1 )G(k)u(k) 

k=j (4.37) 



Setting the partial derivatives of (4.36) with respect to u(k), X(k) and 
x(k) equal to zero yields the following equations; 



x(k+1 ) = F(k)x(k) + G(k)> (k) 


(*.0 


A( k “1 ) = F» (k)X(k) 


(4.38) 


u(k) = G« (k)A(k) 


(4.39) 
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T' < r 

Note that u(k) is the output of the adjoint system. From (4.38) we may 
obtain the following expression; 

X(k) = i(k,m-1 )A(m-1 ) (4.40) 

Substituting (4.38) and (4.40) into (4.37) yields 

m~1 

2 ^(m) - i(m, j)x(j) = g I(m,k+1 )G(k)G« (k)i(k,m-1 )A(m-1 ) (4.41 ) 

Using the identity (4.7) we may rewrite (4.41 ) in the following 
two equivalent forms; 

2 ^(m) - S(m, j)x(j) = W f (j,m}A(m~1 ) ,m = j+n (4.42) 

x^m) = 2(m, j)x(j) = M^j.m-I ) \(ra-1 ) 0 m = j+n (4.43) 

Lemma 4.52 says that the state x^ may be reached from x(j) in n transitions 
if and only if some ^(m-1 ) satisfies (4.42). Lemma 4.53 says that every 
state may be reached from x(j) in n transitions if and only if there is 
a unique ^(m-1 ) satisfying 4.42. If we consider u(k) to be the output of 
the adjoint system then, by lemma 4.54, A_(m-1 ) can be uniquely determined 
on the basis of the following sum; 

m-1 

2 ^(m) - I(m„j)x(j) = V 1 5 (m-1 ,k)G(k)u(k) 

k=j 

if and only if the adjoint system is completely observable on the set of 
times {_ j,...,m-lj. Theorem 4.56 simply says that the system is completely 
controllable if and only if the Lagrange multiplier _^(m-1 ) may be determined 
uniquely; that is, if and only if the adjoint system is completely observable 
on the set of times [ j , . » . ,m-1 ] . 
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At present almost nothing is known about controllability and 
observability for nonlinear systems. A theorem on local controllability 
of nonlinear systems was proven by Markus and Lee [71 ] by assuming that 
the linear variational equations were completely controllable. The dis- 
cussion of their paper by Kalman is also of interest. 

In spite of the present lack of understanding of these matters, 
there still seems to be a strong connection between observability of a 
nonlinear system and controllability of the adjoint system. This is in- 
dicated by the fact that the ubiquitous two-point boundary value problems 
involve the adjoint equations. For example, consider a system in which 
the state vector may be divided into a completely unobservable part x 2 
and a remaining part x^ , in such a way that the equations for the system 
may be written in the following form; 



^(k+1) = f^OO.uCkhk) 
x 2 (k+1) = f 2 (x-| (k),x 2 (k),u(k),k) 



£(k) = h(x.| (k)„k) 



If we examine the adjoint system 



X(k~1) = (3f'/sx) A(k) + (3h'/ax) v(k) 



or, more specifically 




(k-1 ) 
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we find that there is no trajectory along which the linearized adjoint 
system is completely controllable „ 

Similarly, if we consider a system with obviously uncontrollable 
state variables of the following form; 



£, (k+1 ) = f 1 (x l (k),x 2 (k),u(k),k) 
x 2 (k+1 ) = f^x^k^k) 

we find that there is no trajectory along which the following adjoint 
system is completely observable; 



A, (k-1 ) ' 




1 1 


0 




A, 00 ' 


A 2 ( k ~ 1 ) 




i 

Q) 

oT 

A* 


_ 




1 

I 



1 




0 ] 



^00 

X (k) 

~2 



If we were to consider the problem of observing a small perturbation 
of the initial state x(j) of the system (4 0 11) and (^» 13 ) we would investi- 
gate the variational equations 

S x(k+1 ) = (3f/ s £) £x(k) 

<?£(k) = ( a h/ x) <fx(k) 

Complete observability of the variational equations along the nominal tra- 
jectory would insure local observability of a small perturbation in x(j),, 
Complete observability of the variational equations is, of course, the same 
as complete controllability of the adjoint equations „ 
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CHAPTER V 



EXPERIMENTAL RESULTS 



5.1 Introduction 

In this chapter we shall present the results of the application 
of several of the techniques of Chapter II to some particular examples. 

The aim of these numerical studies is to give an indication of the type 
of performance that can be expected when approximation methods are used 
in nonlinear estimation problems. With regard to the approximation 
technique for the filtering problem given in section 2.81 ; the behavior 
of the P* matrix, the stability of the filter, the effect of known con- 
trol inputs, and the overall quality of the estimates, are all of prime 
importance. With regard to the smoothing problem; the quality of the 
first approximation sequence { x*(k|n) j and the convergence properties 
of the successive linearization technique, are the basic factors to be 
considered. While one cannot draw positive general conclusions from 
specific examples, we feel that the results of these simulation studies 
indicate that the methods under consideration hold great promise. 

5.2 Pictures and Random Variables 

The simulation studies discussed in this chapter were performed 
on an IBM 7090 computer. The results will be displayed in a series of 

i 

pictures; each of which represents a time history of 1000 points, time 
running from left to right. The scale for each picture will be indicated 
by giving the maximum in absolute value of the variable or by referring 
to an adjacent picture of the same scale. Positive values of the variables 
lie below the horizontal axis. 
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Gaussian random variables were approximated by a sum of twelve 
independent pseudo-random variables, each having a uniform distribution 
on the interval [ 0 , 1 ] . 

5.3 Example ; As a first example a first order system with a randomly 
varying parameter was investigated. Such a system could be a model for 
a noisy RC network in which the resistance varied in a random manner. The 
continuous- time version of the system is given by the following equations; 



X 1 (t) 


= [a + x 2 (t)] x 1 (t) + u (t ) + W 1 (t) 


(5.1 ) 


x 2 (t) 


= b x^(t) + w 2 (t) 


(5.2) 


z(t) 


= x 1 (t) + v(t ) 


(5.3) 



where u(t) is a known input; w^t), w 2 (t) and v(t) are white Gaussian 
noise processes; z(t) is the observed output; and the random parameter 
x 2 (t) is the output of a linear system excited by white Gaussain noise. 
Because the simulation was to be conducted on a digital computer, this 
system was approximated by the following discrete-time system; 

l 

x 1 (t+h) = x 1 (t) + [a + x^t)] x 1 (t)h + u(t)h + w 1 (t)h (5.4) 

x 2 (t4h) = x 2 (t) + b XgCtJh + w 2 (t)h (5.5) 

z(t) = x 1 (t) + v(t ) (5.6) 

For a = - 1.0, the value of .01 was used for h. This corresponds 
to a sampling period of 1 0 milliseconds for a system with an average time 
constant of one second. 

The recursive scheme of (2.73), (2.74) and (2.77) was used to 
obtain estimates of x^ and x^ in real time. The successive linearization 
technique of section 2.84 was used to obtain a solution of the smoothing 
problem. The equations describing the estimation procedure for this 
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problem can easily be obtained by straightforward substitution into the 
basic equations of Chapter II. 

5.31 Case I 

The following data pertains to the simulation results shown in 
ELate I. 



system parameters; 


a = - 1 .0 
b = - 0.1 
h = 0.01 






initial conditions; 


x^ (o) = 1.0 

x 2 (o) = - 0.5 






a priori distribution; 


m^ = 0.0 

ra 2 = 




Pn (o) = 0.5 

Pi 2 (°) = o.o 
P 22 (°) = 0.6 


noise levels; 


E [w 1 (j)w 1 (k)] = 


36.0 


fjk 




E [w 2 (j)w 2 (k)] = 


1.0 


^jk 




E [v(j)v(k)] = 


9.0 


^jk 



w^ , w 2 and v are statistically independent. 
Discussion ; This data set represents a rather severe noise condition as 
indicated by parts B and C of Plate I. The real time estimate of 
closely followed the actual value. The error in the a priori estimate 
of Xg was overcome soon after the known input u excited a transient, as 
can be seen in parts G and H. 

The results of the application of the successive linearization 
technique to the smoothing problem are shown in parts F and I. The iter- 
ation procedure was terminated when the maximum deviations between the 
ith and the i+1 approximations for both x^ and Xg were less than one per- 
cent of the maximum value of the quantity being estimated. This criteria 
was satisfied by the third approximation. The difference between the 
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first three approximations could not be detected in the pictures. 

Note that in the smoothed solution for x 2 shown in Part I, the error 
in the a priori estimate was overcome. 

The known input u 'was chosen as a representative example of the 
type of saturating signals occurring in control systems. The total com- 
puter time used for the entire simulation was 0.73 minutes. 



5.32 Case II 



The following data pertains to the simulation results shown in 
Plate II. 



system parameters; 


a = -1.0 

b = 0.0 

h = 0.01 


initial conditions; 


x. (o) = 2.0 

x 2 (o) =-1.5 


a priori distribution; 


o o 

O 0 

o o 
II II 

j=r ^ 


noise levels; 


E [w 1 (j)w 1 (k)> 
E [w 2 (j)w 2 (k)]= 
E [v(j)v(k)] = 



p 11<°> 

p 12$ 0 ) 

p 22 (o) 

16.0 

jk 

0.0 



9.0 



1.0 

0.0 

0.6 



, w 2 and v are 



statistically independent. 



Discussion ; For this data set the parameter x^ is an unknown constant. 
The severity of the noise conditions is indicated by parts B and F of 
ELate II. In part D it can be seen that soon after the known input u 
excited a transient, the estimator was able to estimate Xg very closely. 
The estimate of x^ is shown in part H. The improvement in the estimate 
of x.j with increasing time can be attributed to the information in the 
transient and the improved estimate of x-,. 
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5.4 Example ; As a second example, a second order system with a randomly 
varying natural frequency was investigated. The continuous-time version 
of the system is given in the example of section 1.43. Because the sim- 
ulation was to be conducted on a digital computer, this system was approx- 
imated by the following discrete-time system; 

x 1 (t+h) = x 1 (t) + x 2 (t)h (5.7) 

x^t+h) = x £ (t) - b x 2 (t)h - [ c Q + x^(t) ] x 1 (t)h + u(t)h + w 1 (t)h 

(5.8) 

x^(t+h) = x^(t) - a x^(t)h + w 2 (t)h (5.9) 

z(t) = x 1 (t) + v(t ) (5.10) 

The recursive scheme of (2.73). (2.74) and (2.77) was used to 
obtain estimates of x^ , Xr, and x^ in real time. The equations describing 
the estimation procedure for this problem are easily obtained by straight- 
forward substitution into the basic equations of Chapter II. 

5.4l Case, III 

The following data pertains to the simulation results shown in 
Plate III. 

system parameters; 

initial conditions: 
a priori distribution; 



a = 0.1 

b = 0.2 



*.(o) = 0.5 
x 2 (o) = 0.5 
x^(o) = - 0.5 



P-1-1 (o) = 1.0 
p 1? (o) = 0.0 
Pi 3(0) = 0.0 



p 22 (o) = 0.5 
p ? q(o) = 0.0 

P 3 3(o) = 0.5 



mi = 0.0 

bu = 0.0 

m3 = 0.0 
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noise levels; 



E [w 1 (j)w 1 (k)J = 0.16 6^ k 

E [w 2 (j)w 2 (k)] = 225.0 <5^ k 

E Lv(j)v(k) ] = 0.64 <$j k 

, w 2 and v are statistically independent. 

Discussion ; In this data set large and rapid changes in the natural fre- 
quency [c Q + Xj] occur. The noise conditions are moderate as shown in 
parts B and C of Plate III. The real time estimates of the state variables 
are seen to follow satisfactorily the rapid changes in the state variables. 
Note that, since Xg is the discrete-time analogy of the derivative of , 
the job of estimating this quantity is intrinsically difficult. Again 
there is improved behavior after the known input u excites a transient. 

The behavior of the P matrix shown in parts J through 0, is typical of 
the behavior observed in a large number of runs. The total computer time 
used for the entire simulation was 0.74 minutes. 

5.42 Case IV 

The following data pertains to the simulation results shown in 
Plate IV. 



system parameters; 


a = 0.0 

b = 1.0 
c = 0.0 
h = 0.01 




initial conditions; 


x-j ( 0 ) - 0.0 

x 2 (o) — 0.0 
x^(o) = 4.0 




a priori distribution; 






m 1 = 0.0 
m 2 = 0.0 
m^ = 1.0 


P-i -1 (0) = 0.0 
p 12 (°) = 0.0 

P 13 (o) = 0.0 


P ? o(o) = 0.0 
p£a(o) = 0.0 
P 33 (o) = 4.0 
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noise levels: 



E fw 1 (j)w 1 (k)] = 1.0 «fj k 
E [> 2 (j)w 2 (k)] = 0.0 
E L v (j)v(k)] = 0.01 

w.j j w 2 and v are statistically independent. 

Discussion : For this data set there is no known input. The natural fre- 

quency x^ is an unknown constant. The noise condition is very severe as 
indicated by part A of Plate IV. The estimate of takes about 750 
sampling periods to overcome the error in the a priori estimate and settle 
on a good estimate of x^. Once this has occurred, the estimates of the 
other state variables also improve. The total computer time used for 
the entire simulation was 0.64 minutes. 

5.43 Case V 

The following data pertains to the simulation results shown in 
ELate V. 



system parameters: 


a = 0.0 

b = 0.5 
c Q = 0.0 
h = 0.01 






initial conditions; 


x 1 (o) = 0.0 

x^o) = 0.0 
* 3 (o) = 3.0 






a priori distribution: 








o o o 
© 0 0 
o oo 

II II II 


Pi 1 (o) = 1.0 

P 12 (°) = o.o 
P 13 (o) = 0.0 




P 22 (o) = °*5 

Poq(o) — 0.0 

P^Co) = 4.0 


noise levels: 


E [Wi (j)w 1 (k)] = 


0.25 


*jk 




E [w 2 (j)w 2 (k)] = 


0.0 






E [v(j)v(k)] = 


1.0 


s » 



w.j „ w 2 and v are statistically independent. 
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Discussion ; For this data set the natural frequency of the system is 

again an unknown constant; but this time there isiknown input. The noise 

levels are moderate as indicated by parts B and C of Elate V. Again notice 

that the estimate of x^ overcomes the error in the a priori estimate soon 

after the known input excites a transient. The overall performance of 

the estimator is quite satisfactory. The behavior of the P* matrix is 

* 

typical. Note that the "variance" p^ decreases rapidly as the estimate 
* 

x^(klk) improves. The total computer time used for the entire simulation 
was 0.68 minutes. 

5 M Case VI 



The following data pertains to the simulation results shown in 
Plate VI. 



system parameters; b = 0.5 



0.5 

2.0 

0.01 

0.1 

0.2 



11 V e V 

(system) a = 0.1 

(estimator) a = 0.2 



initial conditions; - n a 




a priori distribution; 




(system) E [w 2 (j)w 2 (k)] = 0.0 



(estimator) E [w 2 (j)w 2 (k)] = 225.0 <^ k 



E [v(j)v(k)j = 1.0 cTj k 



w.j „ w 2 and v are statistically independent. 
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Discussion; For this data set the estimator had imperfect information 



about the statistical behavior of Xy The actual value was the geometric 
curve obtained as the output of the following system; 

x^Ct+h) = x-j(t) - 0.1 x^(t)h x^(o) = 7.0 

The estimator was designed under the assumption that x^(t) was the out- 
put of a linear system of a different time constant excited by white noise. 

The noise conditions were moderate to severe as is indicated by 
parts B and C of ELate VI. Despite the large error in the a priori es- 
timate and the misinformation about the statistical nature of x^ 5 the es- 
timator quickly overcomes the initial error and gives satisfactory per- 
formance, even before the known input u excites a transient in the system. 
The total computer time used for this simulation was 0.76 minutes. 

5.5 Conclusions 

I 

$ 

In all cases studied the elements of the P matrix were well 

behaved. In fact, they were so well behaved that it seems likely that 

in an application of these techniques, a simplified version of the filter 

could be obtained by presetting the values of the diagonal elements of 
* 

the P matrix to agree with the average behavior of these elements in 
previously conducted simulation studies. This would be very desirable 
if an analogue computer were to be used, since it would reduce the 
number of multipliers needed. 

No instability of the estimator was observed. 

The effect of known inputs is to excite transients which contain 
information about the system parameters and simplify the estimation 
problem. The exact nature of the known input does not seem to matter as 
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long as transients are excited. 

The performance of the estimator in Case VI indicates than an 
estimator designed on the assumption that unknown parameters are the 
outputs of linear systems excited by white noise, can give good estimates 
even when this assumption is apparently unjustified. This is very im- 
portant since information about the statistical behavior of a parameter 
may be limited. This is also an indication of the power of the model 
which treats random parameters as the outputs of linear systems excited 
by white noise. In practice the time constants of these fictitious linear 
systems would be chosen to reflect the anticipated rapidity of the en- 
vironmental changes. 

The quality of the estimates depends on the difficulty of the 
problem; that is, on the presence of known inputs and on the noise levels. 
The examples presented, for the most part, represent cases of rather 
severe noise conditions in which satisfactory estimates were repeatedly 
given. A noisy noise annoys an oyster but apparently, the nonlinear 
estimator is a pearl of a different sort. 

The method of successive linearization converged rapidly to the 
solution of the nonlinear two-point boundary value problem in the cases 
investigated. The first approximation fx*(k In )J was excellent in all cases 
and could be used with confidence as the starting point in any method 
of successive approximations. 
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- 1*4 




A: Known input u(k) 

Max: 10.0 




B: Total input u(k) + w(k) 

Scale of A 




C: Observed output 2(k) 

Max: 16.5 



E>: State variable x-j (k) 

Max: 9.5 





F: 



E: Real Time Estimate x^(klk) 

Scale of D 



PLATE I 



Successive Linearization Xj_(k|n) 
i = 1, 2, 3 Scale of D 











-\u- 




G: State variable x ? (k) 

Max: 0.51 




H: Real time estimate x^(k|k) 

Scale of G 




I: Successive linearization x„ (k|n) 

i = 1 , 2, 3 Scale of G i 



PLATE I ( con't . ) 






- I2.W- 




A: Known input u(k) 

Max: 10.0 



B; Total input u(k) + w(k) 
Scale of A 




C: State variable x~(k) 

*2(k) =-1.5 



D; Real time estimate x^(,klk) 
Scale of C 




PLATE II 
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F: Observed output z(k) 

Max: 13.4 





G: State variable x 1 (k) 

Max: 4.4 



£ 

H: Real time estimate x-j(kik) 

Scale of G 





I: Successive linearization 

(kin) 

i=1 Scale of G 



J: Successive linearization 

x. (kin) 

1 i 

i = 2, 3 Scale of G 



PLATE II (con't . ) 







- 12.8 




A: Known input u(k) 

Max: 5.0 




B: Total input u(k) + w(k) 

Scale of A 




C: Observed output z(k) = x. (k) + v(k) 

Max: 4.97 




PLATE III 







- \ - 





H: c(k) = c Q + x^(k) 

Max: 8.85 , c 0 = 3.0 




J: (k) Scale of 0 




I: Real time estimate 

c*(klk) ■ c + xo(klk) 
Scale of H 




K: P* 2 (k) Scale of 0 




L: P*^(k) Scale of 0 




M: P^Ck) Scale of 0 




PLATE III ( con't . ) 
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A: Observed output z(k) = (k) + v(k) 

Max: 0.4 




D: State variable ^(k) 

Max: 0.2 



E: 



Real time estimate x (kik) 
Scale of D 2 




PLATE IV 












A; Known input u(k) 
Max: 5.0 



B: Total input u(k) + w(k) 

Scale of A 




C: Observed output z(k) = x. (k) + v(k) 

Max; 5.28 




D; State variable x. (k) E: Real time estimate x?(k|k) 

Scale of C Scale of C 



PLATE V 












F: State variable Xg(k) 

Max: 2.44 




G: Real time estimate x^(k|k) 

Scale of F 




H: State variable x~(k) 

Value 3.0 ' 




I: Real time estimate x^(klk) 

Scale of H ^ 






M: P^Ck) Scale of 0 





PLATE V ( con't . ) 










A: Known input u(k) 

Max: 5.0 



B: Total input u(k) + w(k) 

Scale of A 




C: Observed output z(k) = (k) + v(k) 




PLATE VI 
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F: State variable x 2 (k) 

Max; 1.4 




G: 



Real time estimate X2(klk) 
Scale of F 




H: State variable Xo(k) 

Max; 7.0 * 




I: Real time estimate xJ(k|k) 

Scale of H J 




J : c(k) = c 0 + x-j(k) 

Max: 9.0 , c Q = 2.0 




K: Real time estimate 

c*(klk) = c + x~(klk) 
Scale of J ^ 



PLATE VI ( con»t . ) 









CHAPTER VI 



CONCLUSION 

Our research has been devoted to a fundamental problem of con- 
siderable importance in modern control theory* The linear problem has 
been solved (Appendix D) and a scratch has been made in the armor of 
nonlinearity* 

The experimental results of Chapter V are extremely encouraging 
and indicate that the approximation techniques may have wide applicability. 
The results show that better performance can be expected when some inputs 
to the system are known* Such known inputs usually excite transients,, 
the effect of which can be measured at the output. These transients con- 
tain information about the state of the system and the values of various 
parameters. One can further speculate that if information about a parti- 
cular parameter is not available in the observations, then the performance 
of the system is relatively insensitive to variations of that parameter 
and the need for accurate estimation is diminished. 

The approximate solution to the nonlinear filtering problem is 
immediately applicable to adaptive control problems. Note that if measure- 
ments can be made of the input and the output of a subsystem containing a 
particularly troublesome parameter, then the order of the problem can be 
reduced significantly by treating only that subsystem. 

The assumption that various noises are Gaussian is equivalent to 
making a least-squares estimate when only means and covariances are known. 

Like most research programs, this report raises more questions than 
it answers. It is hoped that the following list will stimulate thought, 
discussion and future research. 
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A theoretical question of great practical importance is that of 
stability.. The stability of the linear filter has been shown by Kalman 
[29] using the direct method of Liapunov [4oJ. It seems likely that the 
stability of the nonlinear filter obtained by linearization about the 
present estimates can be proved under certain restrictions. What are 
these restrictions? 

It would be interesting to know the most general types of systems 
for which the probability density function (2.7) is unimodal. Convexity 
of (2.8) is a sufficient condition. 

A fast simple numerical procedure for solving the smoothing prohlem 
guaranteeing convergence would be of grgat value. 

The areas of observability and controllability are suitable for 
theoretical research. 

If a stationary random process is viewed as the output of a linear 
system excited by white noise, then the approximation techniques might 
be used to identify the parameters of this fictitious system. It would 
be interesting to compare the results of such an approach with the more 
conventional approach of measuring the correlation function or the spectrum 
and then approximating the spectrum by a rational polynomial. 

It seems likely that a combination of the method of successive 
linearization and the filtering technique could be used to obtain better 
estimates in real time. That is. instead of linearizing about the present 
estimate, one could smooth over the past N observations . This seems to 
be a real possibility in applications where sampling periods and computa- 
tion times are of the same order of magnitude. 

The development of approximation techniques for use in conjunction 
with the dynamic programming formulation of the discrete- time problem 
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would be an important contribution,, Such techniques would be immed- 
iately applicable to a wide range of dynamic programming problems . 

It would be interesting to apply the successive linearization 
technique to the nonlinear regulator problem discussed in section 3.8. 

It would be desirable to simplify the nonlinear filter as much 
as possible without impairing performance. Presetting the values of the 
diagonal elements of the P matrix is one possibility. Another possibility 
is to use the basic configuration of the filter but to set certain para- 
meters by the method of stochastic approximation [38, 36 J. 

Finally,, there is need for further simulation studies applying 
the methods of this report to more complicated systems. 
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APPENDIX A 



MATRIX IDENTITIES 



A1 Quadratic Forms 

We begin by summarizing a few facts about matrices so that 
they will be available for reference* 

The length (or Euclidian norm) of a vector x is simply 

|| X | = l/iCj 2 + o.o + 

Clearly, || x || > o for x ^ o* 

Let Q(x) be the quadratic form associated with a symmetric 
matrix A* That is, 

Q(x) = x'Ax = ZT a. xx. , a.. = a.. 

i,j ij 1 J J1 

A is said to be positive definite if Q(x) > o, for x ^ o* Clearly, 
Q(o) = Oo A is non-negative definite if Q(x) - o, for x ^ o* Let 
A and B be non-negative definite and let C = A + B 0 Then 



x'Cx = x" (A + B)x = x^Ax + x°Bx 



Thus, C is at least non-negative definite. G would always be positive 
definite if either A or B were positive definite and the other were non- 
negative definite* 

Let D be any matrix* Then D'D is non-negative definite, since 
x'D'Dx = | Dx if - o. If D is non-singular, then D'D is positive definite 
since in this case Dx = o implies x = o. 
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Conversely, any positive definite matrix A may be factored into 
a product of the form D'D where D is non-singular. To see this we re- 
call that the characteristic numbers of a positive definite matrix are 
positive and that any symmetric matrix may be reduced by an orhtogonal 
transformation to a diagonal matrix having its characteristic numbers 
on the diagonal. Let A be positive definite and let T be the appropri- 
ate orthogonal transformation (T* = T°^), Then 

T * AT = = diag ( \ ^ , « , , , \ , X ^ 3 o 

A suitable D' is T* diag( ^-j >/A n )« If A were non-negative 
definite D would be singular, since some of the terms on the diagonal 
of -A would be zero. If A were positive definite, then A”"* could be 
written in the following form; 

A” 1 = T diag(1 / A 1 .... .l/\ n )T» 

which shows that A“^ is positive definite. 

Finally, if A is non-negative, then H 1 AH is at least non- 
negative definite, since x”H , AHx =M DHx ||- o. 

In the opinion of the author, the use of block diagrams or flow 
graphs enables one to produce matrix identities far faster and more simply 
than by straight algebraic techniques. We will view a matrix as a linear 
operator. For example, the equation y = Ax we interpret as a system 
A operating on an input x to produce a unique output ;£, Similarly, the 
equation £ = ABx is given the interpretation that first B operates on 
x and then A operates on this result to produce See Fig, A-1 , 
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z = Ax 




Z_ = ABx 



Fig. A-1 



Some basic axioms of matrix theory and the associated block diagram 
manipulations are given in Fig. A~2. 




A + B = B + A 



(A + B)C = AC + BC 



C(A + B) = CA + CB 



if A" 1 exists 
A + B = [i + BA” 1 ] A 



if A” 1 exists 
A + B = A [I + JpB] 



Fig. A-2 
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The question of placing a matrix inside a feedback loop is 
slightly more complex but is of extreme importance. Consider the 



"system" of A-3« 




Fig. A-3 

This "system" implies the equation % = x-Ay or [ I + A ] y = x. Note 
that the output £ is specified uniquely by x only if [I + A] exists. 
Thus, the diagram of Fig. A-3 is valid only if [I + A]" 1 exists, since 
only then is the output uniquely specified by the input and in our 
basic concept of a system we require this. 

In order for a block diagram to represent a valid system the 
output of each individual element of the block diagram must be uniquely 
specified for each possible input. The block diagram manipulations of 
Fig. A- 2 may then be performed without affecting this uniqueness to 
obtain equivalent systems. 

Some examples will now be given to show how the ability to 
picture a matrix operation in the form of a block diagram is of great 
assistance. Important relations are obtained in Example 3® 

Example 1 . Show that 

[I + AB] ^ = [A" 1 + Bj^A” 1 if A” 1 exists 

and 

[i + ABj°^ = B + A] if B" 1 exists 



We use the visual aid of the block diagram of Fig. A-4, although a 
straightforward algebraic proof is simple for this simple problem. 




'fQ 


+ 

* ~ *1 


p — > 


B 










A 


« 



Fig. A-4 



* 



Note that in moving blocks around the loop it is clear when the 
additional assumption must be made as to the existence of some inverse. 

Example 2 . Consider the question , "Does the existence of [ I + ABl~ 
imply the existence of [i + M]” 1 ?" If so, determine the latter in 
terms of the former. We note that from Fig. A-5 [i + AB]” 1 A = A [I + BA] 
so the answer to the question is yes. 




Fig. A-5 



To obtain an expression for the inverse we introduce the quantity 
w = x-By. Then, by the definition of ^ in the diagram on the left side 
of Fig. A-5, 

w = [I - B [I + AB]" 1 A] x 



but, by the diagram on the right side of Fig. A-5, 
w = [I + BA]” 1 :. 
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for all x, Thus 



[I + BA]" 1 = I - B [I + AB]" 1 A (A.1 ) 

which may be verified by direct multiplication „ 

Example 3 . 'As a third example we shall develop two matrix identities 

-1 

of practical importance. Suppose R exists and consider the expression 
«*1 

PH 8 [ HPH 8 + R]~ . In Fig. A-6 we simply back things around the loop to 
obtain the first desired result by inspection. 

PH' [ HPH 8 + R]" 1 = [I + PH'Hh]’ 1 ffl'R" 1 (A*2) 




Fig. A=6 



Using (A. 2) we note that 

i - ph* [hph 8 + r]" 1 h = i - [i + ph 8 r~ 1 h]~ 1 PH'R" 1 H 

Rewriting I as a product, this relation is equal to 

[i + JS'iHh]" 1 [i + ph'r” 1 h ] - [i + ra-R" 1 !]" 1 ph'r^h 

Canceling terms, we obtain the identity 

I - rafHPH' + R]” 1 H = [I +PH , R" 1 H]~ 1 (A.3) 

This result may be verified by direct multiplication. Note that (A.3) 
could also have been obtained by substituting PH ' for B and R°°^H for A 



142 - 




in (A.1) and then bringing R _1 inside the inverse. 

The easy way in which the results (A.2) and (A. 3) were obtained 
using the graphic aid of the block diagram is an indication of the 
power of this approach both for obtaining identities and proving the 
existence of inverses. 
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APPENDIX B 



THE GENERALIZED INVERSE OF A MATRIX 
B1 Introduction 

The concept of the inverse of a matrix was first generalized by 
Moore [47, 48] to include rectangular and singular square matrices. 
Later, Penrose [50, 51] independently introduced the same basic concept. 
Further discussion of the generalized inverse was given by Greville 
[ 21 , 22 ] . 

Penrose proves that for any matrix A there is a unique matrix 

§ 

A satisfying the following four relations; 



AA # A = A 


(B.1) 


(AA # ) ' = AA # 


(B.2) 


if Alt = A # 


(B.3) 


(A*A) ' = A^A 


(B.4) 



and proceeds to discuss the properties of the generalized inverse. 

Kalman [29] defines a pseudo-inverse of a matrix A to be any 
matrix A^” satisfying (B.1 ) and shows how such a pseudo-inverse may be 
used in linear filtering and prediction problems. 

This appendix is intended to serve as an introduction to the 
notions of generalized inverses and pseudo-inverses. We discuss their 
use in connection with the solution of linear algebraic equations. A 
geometrical interpretation of the generalized inverse in terms of orth- 
ogonal projections is presented. For simplicity, the discussion is 
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limited to real matrices 



B2 The Equation Ax = y 

Consider the set of linear algebraic equations; 

a n x i + • • • + a iA = yi 



a m1 x 1 + • • • + Wn = < B -5> 

or, equivalently, the matrix equation Ax = y, where A is a known m by n 

matrix, £ is a known m- vector, and x is an n- vector to be determined. 

If there is no vector x which satisfies this relation, the set 

(B.5) is said to be inconsistent. Otherwise, the set is consistent and 

there may be a unique solution or an infinite number of solutions. 

Suppose a solution exists and we decide to choose the smallest 

' 2 

solution; that is, the solution for which |[x || or, equivalently, \ ||x || , 
is minimum. If we introduce a vector of Lagrange multipliers and treat 
(B ,5) as a constraint, we may minimize 

j x’x +2" (y ~ Ax ) 

Setting the partial derivatives of this expression with respect to x 
and A equal to zero yields the following two equations; 



x = A"2 


(B.6) 


If 


(B.7) 


Combining these equations, we obtain 




Z ~ M' A 


(B.8) 
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If AA* is non-singular we may solve this equation for \ and combine 
with (B.6) to obtain the desired solution; 

x = A' (AA 1 

One can easily verify that if A' (AA* J” 1 exists, then it satisfies (B.1) 

through (B.4) and hence, is the generalized inverse. 

Suppose, on the other hand, that the set (B.5) is inconsistent. 

Then one might wish to make a least squares approximation by minimizing 
2 

|| Ax - £|| . If we set the derivative of this expression equal to zero, 
we obtain 

A' Ax = A"2; 

If AlA is non-singular we may solve this expression for x to obtain the 
desired result, 

x = (A'A)" 1 A'^ 

If (A 1 A) A 1 esists, it satisfies (B.1) through (B.4) and hence, is the 
generalized inverse. 

B3 Ihe Pseudo-Inverse 

A pseudo-inverse of a matrix A is defined [29] as any matrix A**" 
satisfying the relation 

AA + A = A 

It follows from (B.1 2) that (A 1 )”*' = (A^) 1 and that if A” 1 exists, it 
is equal to A"*’. 

The following procedure , similar to that given in [29] , may be 



(B.9) 



(B.1 0) 



(B.1 1 ) 



(B.12) 
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used to obtain a pseudo-inverse for any matrix. We first consider the 
case in which A is square. It is well known [ 17 ] that any square matrix 
may be reduced to a diagonal canonical form by a similarity transformation, 

PAQ = E 

where P and £ are non-singular and E is a diagonal matrix having only 
zeros and ones on the diagonal. Then, 

PAQPAQ = E 2 = E 



and 



AQPA = P^ECf 1 = A 



Hence, £P satisfies (B.12) and is a pseudo-inverse of A. Since 
1 *1 

£ and P exist, a non-singular pseudo-inverse may always be found 
for a square matrix. 

To obtain a pseudo-inverse of a non-square matrix we make use 
of the fact that either of the following relations imply that B is a 
pseudo-inverse of As 



( ABA - A) (ABA -A) 1 =0 (B.13) 

( ABA - A)' (ABA - A) = 0 (B.14) 

Direct substitution into (B.13) and (B.14) respectively will verify that 
the following expressions are pseudo-inverses of A; 

A'(M') t (B.15) 

(A'A) + A* (B.1 6) 
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The pseudo- inverse in (B»1 5) and (B.l6) may be computed by the first 
method since (A 1 A) and (AA* ) are square. 

Since (B.15) and (B,l6) are, in general, not the same, we see that 
the pseudo-inverse is not, in general, unique. In fact, if B is a pseudo- 
inverse of A, then so are 



where M is an arbitrary matrix. 

The pseudo-inverse may be used in connection with the solution 
of equations such as (B.5) because it has the property that any pseudo- 
inverse of A may be used to test for the existence of a solution and 
any pseudo- inverse of A may be used to obtain a solution when one exists. 
This property is summarized in the following theorem which is similar 
to one proved by Penrose for the generalized inverse. 

Theorem ; The equation jr = Ax, where y and A are known, has a solution 
if and only if, for any A**" satisfying (B.12), the following relation 
is satisfied; 



B + (I - BA)M 



and 



B + M(I - AB) 




(B.17) 



If (B.17) is satisfied for some pseudo- inverse A^", then it is satisfied 
for every pseudo- inverse of A and x = A^ is a solution of (B.5) for every 
pseudo-inverse of A. 



Proof ; i.,(if). Let A^ be any pseudo-inverse for which (B.17) holds 

Then A^ £ is a solution of (B.5).. 



o 



ii.,(only if). Suppose x^ is a solution of (B.5). Then, 
Z = ^1 ” AA^Ax^ = AA^y 
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for every ps eudo- inverse of A. 

iii., Let a£ be any pseudo-inverse for which (B.17) is satisfied and 
let A^ be any pseudo-inverse for which (B.17) is not satisfied. Then, 

z^aa+z = aa Jm} 2 = AA+2 

which is a contradiction. Hence, if (B.17) holds for some pseudo- in- 
verse, then it holds for every pseudo-inverse and a"^ is a solution of 
(B. 5) for every pseudo- inverse of A. 

Let us return to the problem of finding the solution of (B.5) 
having the smallest norm. We are faced with the problem of determining 
a Lagrange multiplier \ which satisfies (B.8). Such a Lagrange multi- 
plier will exist by virtue of the theorem, if and only if, 

(B.18) 

Noting that A* (AA* is a pseudo-inverse of A, it follows from the theorem 
that (B.18) will be satisfied and (B.8) will always have a solution if 
(B.5) has a solution, which was the original hypothesis. Hence, we may 
vise any (AA*)^ to obtain 

1 ~ (M')^k 

and finally, using (B.6), the smallest solution of (B.5) is given by 

x = AKAA')"*Z (B.19) 

This result differs from (B.9) in that we did not have to assume 
the existence of (AA’)^® It is interesting to observe that A* (AA* )^ 
satisfies (B„3) and (B.4) as well as (B.1). 

If we return to the problem of finding the least squares approx- 
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imation , we are faced with finding an x which satisfies (B.1 0). Such 
an x will always exist and we may use any (A'A)^ to obtain 

x = (A , A)^A , £ (B.20) 

+ 

which is similar to (B.10). We observe that (A’A) A' satisfies (B 0 2) 
and (B.3) as well as (B,1). 

B4 The Generalized Inverse 

The generalized inverse of a matrix A is the unique matrix sat- 
isfying (B.1) through (B„4)„ In the previous section we observed that 
A* (AA 8 )"*’ and (A 8 A)^A 8 each satisfied three out of the four of these re- 
lations. Hence, it is not surprising that the generalized inverse may 
be written in forms resembling these expressions. 

We first consider the case in which A is symmetric. Then A may 
be reduced by an orthogonal transformation to a diagonal matrix having 
the characteristic roots of A on the diagonal. 

T 8 AT = = diag(A 1 ,...,\ n ) 

JL 

We form from A by replacing by 1 /x^, for £ o, and leaving 

unchanged the terms of A which are equal to zero. Then, 

A* = T Ajv (B.21) 

It is simple to verify that A F given by (B.21) satisfies (B.1) through 
(B„4)„ Since AA 8 and A 8 A are symmetric, we may find (AA 8 )^ and (A 8 A)^ 
in this manner. One can easily verify that the following two expressions 
satisfy (B.l) through (B.4) and hence, are equivalent expressions for the 
generalized inverse of A; 
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A' (AA* ) 



(B,22) 



(A'A) # A» (B.23) 

It is clear that the generalized inverse is a particular pseudo- 
inverse and hence, may be used to obtain solutions to linear equations 
when solutions exist. The generalized inverse, in addition, has the 
two important properties of always giving the smallest solution when a 
solution exists and giving the best least squares approximation when no 
solution exists. These properties are summarized in the following theorem 
given by Penrose [51 ] which we state without proof. 

Theorem ; Let || A || = trace A' A, Xq is said to be the best approximate 

I 

solution to the matrix equation AX = Y, if, for all X, either 
II AX - Y || > || Mo - I II 

or || M - Y || = || A^ - Y || and || X |l * || II 

Then A^Y is the unique best approximate solution to the equation AX = Y, 



An alternate expression for the generalized inverse may be obtained 
in the following manner. Reconsider the problem of finding a best least 
squares approximation to the solution of the equation Ax = y. We found 
previously that any x satisfying A 'Ax = A’% would do, but we shall now 
seek the smallest x satisfying this relation. We minimize the expression 
2 

2 || x || + A* (A' Ax - A'^) (B.24) 

Setting the partial derivatives of this expression, with respect to x 
and A, equal to zero we obtain the following two equations; 
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These may be combined to obtain 



A‘AA J A A, = A*£ 

Using a pseudo-inverse to solve for we obtain 

! = (a'M'ajVz 

Combining (B.25) and (B»26) yields 
x = A'A(A'AA , A) t A«2: 

The matrix A'A^AA'A^A' may be shown to satisfy (B.1 ) through (B„4) 

A 

and hence, is an expression for A , In verifying this fact it is help- 
ful to note that A'A(A 8 M.'A)^ is a pseudo-inverse of A' A by (B„15). We 
omit the details,, Then, 

A # = A«A(A 8 M'A)tA' 

A 

There is also a related expression for A" given by 
= A' (AA« AA« )tAA' 

Suggestions for computing the generalized inverse are given in [6, 22, 

51 ] . 

B5 A Geometrical Interpretation of the Generalized Inverse 

In order to give the generalized inverse a simple interpretation 



(B.26) 



(Be27) 



(B.28) 



(B.29) 



in terms of orthogonal projections we will need some basic concepts about 



linear transformations and vector spaces such as may be found in [ 2 ] . 
We will view the m by n matrix A as a set of column vectors 
a^ , . If the equation y = Ax is written in the following form, 

emphasizing this point of view. 



i 


t 1 " 




x i 




1 




1 1 
° ° ° — »n 




0 

o 


= 


1 


1 I 1 




o 

o 




> 






*n 

- J 







it is clear that the equation will have a solution if and only if, ^ 
is a linear combination of the column vectors of A; in which case, ^ 
is said to belong to the range of A, ^£R(A). In any case, we may de- 
compose % into the sum £ = jTj + 3^, where jrj e R(A) and Zq as orthogonal 
to the column vectors of A. A vector ^ with this property is said to 
belong to the null space of A", N(A'), since A'^ = o„ R(A) and N(A') 

are called orthogonal complementary subspaces of m dimensional vector 
space, E™ — orthogonal, since any member of one is orthogonal to any 
member of the other ~ complementary, since their union is E 131 — subspaces, 
since any linear combination of vectors belonging to R(A) (or N(A* )) also 
belongs to R(A) (or N(A' ) ) . The intersection of R(A) and N(A') is the 
zero vector. 

Suppose that S and T are orthogonal complementary subspaces of 
E 01 . Consider a transformation P on E 01 such that if we E 01 and w = u + v, 
where ue S and v a T, then 



Pw = u 



Such a transformation P is called the orthogonal projection of E 111 onto S, 
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Necessary and sufficient conditions that P be an orthogonal pro- 
jection are 




P» = P 



We use the notation -R(A) to signify the orthogonal projection onto R(A). 

Let us turn our attention to the diagram of Fig . B-1 . This figure 
should be interpreted in the following manner; any point in the lower 
plane (E 11 ) is mapped in the direction of the arrow by the transformation 
A to a point in R(A) in the upper plane (E 01 ) e 





Fig. B-1 
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Consider the equation Ax = £, where ^ is shown in Fig . B-1 . 

This equation has no solution since ^ 4 R(A). The point nearest to £ 
for which this equation has a solution is ^ , where ^ If we 

now consider instead the equation Ax = jr-j , we see that any point x on 
the line of Fig. B-1 marked Ax = ^ is a solution of this equation. 

From these solutions we choose X| , the one closest to the origin. Note 

it 

that x.| £ R ( A 1 ) . We would like to have a transformation A F which would 
perform these operations. Then x^ would be given by 



xi = A ff £ 

A 

Let us consider the properties such an A F would possess. First, 

JL JL 

R(A )C R(A') since every point y_ would be mapped by A" into a point 
belonging to R(A'). Second, if x^ is to be given by (B.30), we would 
require Axj to be That is, for all £ 

= hup 



(B.30) 



or 



= hu) 



(B.31) 



J. 



Third, if Ax = y, | , then A ^ would be ^x. That is, for all x 



A^=P r( } x 



or 



f A = P 
“ “R(A» ) 



(B.32) 
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The condition (B.32) implies that R(A')CR(/) since, if x e R(A* ), 
then A^Ax = x and x eR(A^). We now have the relation* R(A^)c R(A’ ) and 
R(A')C R(A^) which implies R(A') = R(Af). Thus, (B.32) becomes 




(B.33) 



The condition (B.31 ) is equivalent to (B.1) and (B.2), and the condition 
(B.33) is equivalent to (B.3) and (B.4) „ Hence, all the properties of 
the generalized inverse follow from (B.31 ) and (B.33) and are easily 
given a geometric interpretation as is done in Fig. B-1 . 

B6 Further Discussion of the Equation Ax = y 

In this section we shall prove three simple lemmas which will be 
used repeatedly in Chapter IV. We again consider the linear equation 



where £ and A are known and x is to be determined. 

B6.1 Lemma ; 

The solution of (6.3^) is unique if and only if A' A is positive 
definite. If A* A is positive definite the unique solution is 



Ax = % 



(b.3*0 



x = [A'A]-A' Z 



(B.35) 



Proof ; Let and x be any two solutions of (B.34). Then 



£ = A [x t - x 2 ] 



and 
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Ui - 4] ^ [2£t - 



o 



(B.36) 



If A' A is positive definite (B.36) implies x.j = X£ the solution of 
(B.34) is unique. If A' A is not positive definite, let x^ be any non- 
zero vector such that x^ A 1 Ax Q = o, (i.e., Xq £ N(A)). If Xj is a solution 
of (B. 34) then so is Xj + and the solution of (B.34) is not unique. 

The solution is given by (B.35) since [A* A]”'* A* is the generalized inverse 
of A. 



B6.2 Lemma ; 

A solution of (B.34) will exist if and only if £ e R(AA' ); in which 
case the solution of smallest Euclidian norm is given by 

x = A 8 (AA 1 )^y (B.37) 

Proof ; Using the theorem of section B3 and recalling that A* (AA 1 ) # = A#, 
we observe that (B.34) w ill have a solution if and only if 

Z ~ AA^y - AA* (AA* )^r 

Hence, (B.34) will have a solution only if ^ 4 R(AA 5 ). Moreover, by the 
same theorem, £ = AA'A has a solution, (i.e., £ £ R(M' )» if and on ly if 

£ = AA* (AA 5 

The solution of smallest Euclidian norm is given by (B.37) since A* (AA 5 )# = A^„ 
B6.3 Lemma ; 

A solution of (B.34) will exist for all £ if and only if AA* is 
positive definite; in which case the solution of smallest Euclidian norm 
is given by 
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x = A' (AA 1 

Proof ; A solution of (B.3*0 will exist for all % if and only if all 
^ £ R(A) , or equivalently if and only if N(A’) = o. But a vector ^ 
belongs to N(A') if and only if ^ AA'yy = o. Thus, N(A*) = o if and 
only if AA' is positive definite. 
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APPENDIX C 



GAUSSIAN RANDOM VECTORS 



In this appendix we summarize some known facts about Gaussian 
random vectors or, equivalently, about the multivariate normal dis- 
tribution „ Special attention is given to situations in which the 
covariance matrix is singular. The pseudo-inverse and generalized 
inverse of Appendix B are shown to be applicable to these situations. 

If x is an n-dimensional random vector with mean m and covariance 
matrix V = E [ (x-m) (x-m) * ] , we shall say that x is a Gaussian random 
vector if its characteristic function is of the following form; 



C x (iw) = E [exp iw'x J 



exp 



iw'm 



1 

2 



w*Vw J 



(C.1) 



If V is non-singular (positive definite) the probability density function 
for x takes the following form; 



f(x) 



1 





exp 



2 -I 

T II *-£ II « 

v J 



(C.2) 



The definition of a Gaussian random vector in terms of the 
characteristic function has been suggested [29] in order to avoid the 
difficulties which arise when V is singular. The results to be presented 
in this appendix have been derived elsewhere [29] using a different 
approach based on the characteristic function. We shall use an approach 
which is more in keeping with the body of this work. 

A vector u will be called a normalized Gaussian random vector if 
it is Gaussian with mean E(u) = o and covariance matrix E(uu*) = I. Any 



159 - 



Gaussian random vector x can be considered as having been obtained from 
a normalized Gaussian random vector of the same dimension by a linear 
transformation of the following form; 

x=m = Tu 



Then, 



E(x) = m + T E(u) = m 

E [(x-m)(x-m«)] = TE(uu') T' = TT' = V 

It is shown in Appendix A that a non-negative definite matrix V may 
always be factored into a product TT'. It will suffice for our purpose 
to know that the transformation T exists, since we shall never have to 
compute T» 

Let x-| *2 ^ Gaussian random vectors having the joint distri- 

bution specified by 



'Si' 


m = 


Hi ' 


V = 


in 


Il2 _ 


.22. 




IS 

K> 




A. 


Jr* 

L. 



where V may be singular. 

We consider x to be the result of a transformation on a norm- 
alized Gaussian vector u. Then, 

x-m = Tu , TT' = V (C.3) 

Let H be the matrix [0 I] so that x^ = Hx. Suppose that an experiment 
is performed in which it is learned that the value of x, is z e Having 
learned the value of Xg we wan "k to know the a posteriori distribution 
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for x 1 . The a posteriori distribution for will be Gaussian and 
hence, will be specified by its mean and its covariance matrix. To 
determine the a posteriori mean we minimize 
2 

£ 1| u || + (z-Hm-HTu) 

That is, we locate the mode (or mean) of the a posteriori distribution 
for u. The Lagrange multiplier has been introduced to insure that Xg 
takes on the observed value z. Minimizing (C.4) is exactly the same 
type of problem as was discussed in Appendix B. Proceeding as before, 
we set the partial derivatives, with respect to u and _)» of (C.^) equal 
to zero to obtain the following two equations for the minimizing value ; 

HTu = z~Hm 

u = T'H« A 

Combining {0.5) and (C.6) we obtain 

HTT'H’ A = 

As was shown in Appendix B, a \ satisfying (C.7) will always exist if 
there is a u which satisfies {0.5) c There will be a u which satisfies 
{0.5) unless the observed value of Xg has probability zero; that is, 
unless the singularity of the a priori covariance matrix V makes the 
observed value inconsistent with prior knowledge. We rule out this 
possibility since it leads to division by zero in Bayes 1 Theorem and 
violates our b&sic assumption. Values of Xg f° r which a solution to 
{0.5) exists will be called consistent. Substituting for H and TT* 
in (C.7) yields 



(C.4) 



{0.5) 

{ 0 . 6 ) 

(C.7) 
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(0.8) 



i 22 A = s-s 2 

Any pseudo-inverse may be used to solve for A,. Then, 

A = £ 2 (£ - S 2 ) 

Combining (C.9) and (C„6) gives the following equation; 

u = T'H'V2 2 (z-m 2 ) 

Finally, using (C„3)» we obtain 
x-m = VH'V2 2 (z-m 2 ) 

This is equivalent to the two equations 

Si s Bi + V 12 4 2 (z-m 2 ) 

4 = £2 + X 22 X22 ( £”£2) 



(C.9) 



(C. 10 ) 



(Coll) 

(C012) 



From the pseudo-inverse theorem of Appendix B we know that the existence 
of a solution of (0,8) implies the following relation; 

z-m 2 = VggVj^Cz -m 2 ) (C.13) 

Combining (C.12) and (C»13) leads us to the consistent result x 2 = z . 

The basic result is (C„11) and is summarized in the following statement* 



Statement C-1 ; The conditional expectation of x^ » given a consistent x 2 
is given by the following relation; 

5l = -1 

We may obtain a useful relation from (C.13) by using the fact 
that for all consistent values of x 2 the following relation is satisfied; 
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Postmultiplying both sides of (C.14) by and averaging with re- 

spect to the joint a priori distribution yields 




t 

V V V 
- 22 - 22-21 



(C.15) 



Taking the transpose of (C.15) gives the following useful relation; 



V = V V 
^12 - 1 2 ^ 22^22 



(C.16) 



In the development of an expression for the covariance matrix 
of the a posteriori distribution for x^ , given x 2 # the concept of pre- 
posterior analysis [5^-] will be used. Given any value of x 2 we may find 
the a posteriori distribution for x^ which will, in general, depend on 
the value of x 2 » Before the value of x 2 i s known, parameters of the a 
posteriori distribution may be considered to be random variables. Pre- 
posterior analysis is concerned with the a priori probability of obtaining 
particular values for parameters of the a posteriori distribution. We 
shall need the following simple lemma; 

Lemma ; Let E [0(x . 1 )] be the expected value of some function 0(xj ) 

22 -| 1 2^2 

with respect to the a posteriori distribution. Then the expected value 

of E [0(x.)] with respect to the a priori distribution for x» is the 
--| 1-2 

a priori expected value of 0(x^). That is. 




-I 1 -2 



[ 0(22i )J 



E [0(20] 

2l 



The proof is by inspection since on the left side of this expression 



the average is with respect to the joint a priori distribution for x.j 
and * 2 ° 

Using this lemma we may easily prove the following theorem given 
in [54]. 

Theorem ; The expected value with respect to x^ of the covariance 
matrix of the a posteriori distribution plus the covariance matrix 
with respect to Xg of the mean of the a posteriori distribution is .equal 
to the covariance matrix of the a priori distribution for x^ „ That is , 



E 

*2 



e rxx n 

L — i — i J 



)Z\\Z: 



x, | x 
-I 1 -2 



(x 1 ) 



XI X 

~ l '“2 






+ E 
^2 



x„ I2L-. 
-1 1 ~2 



(x,) 



E 

L x lx 
'“2 



(Xj) 



*2 




L2i|2 



= E 

x, 

“1 



[ x,x, 



X, 

“n 



[2,] 




I 



The proof is immediate by canceling like terms and applying the lemma. 
This theorem is sometimes stated for scalar random variables in the 
following concise form, The mean of the posterior variance plus the 
variance of the posterior mean is equal to the prior variance. 

Statement C-2 ; The covariance matrix with respect to x^ of the mean of 
the a posteriori distribution for x^ is ° This statement 

follows from C~1 since 

E C(x 1 =m)(x i =m, )•] = V vj V V + V 

L — 1 — r — | — 1 J “ 1 2 “ 22 “ 22 ~ 22“21 

r2 

Using (Col 5) or (C,l6) concludes the proof. 
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Statement C-3 ; 



x, - E (x ) is independent of x • 

— 1 ^ —2 

To show this we compute the covariance matrix of this quantity 

and x , From Statement C-1 we have 

-2 



E [(Xj - E 



-1 1-2 ^ 1 ' —2 



= E ([x, -ni - V\ vt. (x -m )] (x 0 -m )'] = V - V V^" V 
y | X 1 L_1 “1 2-22 -2 “2 J “2 -2 J ~I2 “f2”22“22 



1 —2 



This quantity is equal to 0 by (C,1 6), 

This statement implies that the covariance matrix of the a poster- 
iori distribution for x^ is independent of the observed value of x^ and 
hence, the expected value with respect to Xg this quantity is equal 
to the quantity itself. Using this fact and the theorem leads to the 
following statement. 



Statement C-4 ; The covariance matrix of the a posteriori distribution 
forx, given % is V,, 

The generalized inverse may, of course, always be used in place 
of the pseudo-inverse. The use of the pseudo-inverse has been consistent 
with our effort to use a notation compatible with [29] « In practice one 
might prefer to use the unique generalized inverse. 
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APPENDIX D 



GENERAL SOLUTION OF THE DISCRETE-TIME LINEAR ESTIMATION PROBLEM 



D-1 Introduction 

This appendix is devoted to a study of a very general form of 
the linear discrete- time estimation problem in which the covariance 
matrices may be singular. For the sake of generality and at the sacrifice 
of some simplicity, we allow correlation between the random input w and 
the measurement noise v. Necessary background material appears in Chapter 
II and Appendices B and C. 

D-2 Problem Statement 



Consider the system 



x(k+1 ) = F(k)x(k) + G(k)w(k) 



z(k) = H(k)x(k) + v(k) 



05-1 ) 



(D.2) 



where x is the state vector and z is the observation. | w(k)|, the random 
input and [v(k)] f the measurement noise, are sequences of independent ran- 
dom variables with zero means and the following covariance matrices; 



E [w(j) w' (k)] = £(k) 

E [i(J) I'OO] = R(k) <^ k 
E [w(j) v*(k)] = S(k) J jk 



for all intergers j and k 



Based on a finite observation sequence { z.(o), . . . ,z(n)j , we seek an 
estimate of the sequence of states |x(o),... ,x(n+m)] , 
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D.3 Filtering and Prediction 

Suppose that, given the observed sequence { z(o) , ,z(k-1 )} , 
the distribution for x(k) has mean x(klk-l) and covariance matrix P(k) 
Consider the vector 



x(k) 


, 


x(k) 


with mean 


x(k|k-1 ) 


z(k) 




H(k)x(k) + v(k) 




H(k)x(k|k-1 ) 



and having the following covariance matrix; 

7 ~ 

P(k) P(k)H* (k) 

H(k)P(k) H(k)P(k)H* (k) + R(k) 

If z(k) is observed, we may use statements (C=1 ) and (C-4) from 
Appendix C to obtain the following expressions for the mean x(klk) 
and covariance matrix C(k) of the distribution for x(k) given { 2 ( 0 ),.. 

x(k | k) = x(k |k=1 ) + P(k)H'(k) [H(k)P(k)H*(k) + R(k)J + 

[ z(k)-H(k)x(k|k-1 )] 

C(k) = P(k)-P(k)H* (k) [H(k)P(k)H'(k) + R(k)] t H(k)P(k) 



Similarly, consider the vector 



x(k+1 ) 




2(k) 





F(k)x(k) + fl(k)w(kj 
H(k)x(k) + v(k) 



with mean 



F(k)x(k|k=*1 ) 
H(k)x(k|k=1) 



and having the following covariance matrix; 

F(k)P(k)F« (k) + G(k)£(k)G> (k) F(k)P(k)H» (k) + G(k)S(k) 

H(k)P(k)F« (k) + S' (k)G» (k) H(k)P(k)H* (k) + R(k) 



»a(k)J 

(D.3) 

(D.4) 
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If z(k) is observed we may again use statements (C-1 ) and (C-4) to 
obtain the following expressions for the mean x(k+1lk) and covariance 
matrix P(k+1 ) of the distribution for x(k+1 ) given [ £(o)»..o »z_(k)j ; 

x(k +1| k) = F(k)£(k|k~1) + fF(k)P(k)H ! (k) + G(k)S(k)] 

[H(k)P(k)H'(k) + R(k)J f [z(k)-H(k)x(k|k»1)] (D.5) 

P(k+1 ) = F(k)P(k)F ( (k) + G(k)£(k)G' (k) 

- [ F(k)P(k)H' (k) + G(k)S(k)] [H(k)P(k)H* (k) + R(k)] f 

[H(k)P(k)F'(k) + S‘(k)G'(k)] (D.6) 



With practically no effort we have obtained the solution to the 
linear filtering problem and the solution to the problem of predicting 
one step ahead. Equations (D.5) and (D.6) are,, except for slight details, 
the same ones as those given in [29] . 

Note that x(k+1 |k) is not simply an extrapolation of x(k|k). 

This is due to the fact that information about w(k) is available in 
z(k) because w(k) and v(k) are correlated. However. x(k+2|k) and pre~ 
dictions further into the future are simply extrapolations of x(k+1 I k) „ 
since no information is available about future values of w. Since it 
is now evident how to make predictions, we may confine ourselves to the 
problem of estimating [ x(o).... „x(n+1 )j given {£(o)..„„,z(n)j. 

D4- Preliminary Considerations 



Let r(k) = G(k)w(k). and consider the vector 



r(k) 

l(k) 



with covariance matrix 



’ G(k)£(k)G • (k) G(k)S(k) 
S«(k)G»(k) R(k) 
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We consider the vector formed by combining r and v to have been 
obtained by a transformation on a normalized vector u(k), E[u(j)u' (k)J = I 



"r(k) ‘ 




~G(k)fi(k)G'(k) G(k)S(k)‘ 




= T(k)u(k) , T(k)T'(k) 




v(k) 




S' (k)G' (k) R(k) 



Let us introduce the matrices V = [0,1 ] and W = [l,0] such that 



r(k) = WT(k)u(k) 


(D.7) 


v(k) = VT(k)u(k) 


(D.8) 


WT(k)T» (k)W' = G(k)fi(k)G' (k) 


(D.9) 


VT(k)T» (k)V' = R(k) 


(D e 1 0) 


WT(k)T' (k)V' = G(k)S(k) 


(Dell ) 



The a priori distribution for the initial state is Gaussian with 
mean m and covariance matrix P(o) which may be singular,, We shall con- 
sider the initial state as having been obtained by a transformation on a 
normalized vector 0 

x(o) - m = A6 M« = P(o) (D.12) 

D5 Derivation of the General Solution 

Introducing the Lagrange multiplier vectors .6,, A(k) and ^(k), we 
may miniinize the following expression; 

2 n 

I n1 =i||e|| + i' [x(o)-m-A0] + £{i'(k) [z(k)-H(k)x(k)-VT(k)u(k)] 

* k=o 

+ il|u(k)|| 2 + 2'(k) [x(k+1)-F(k)x(k)-WT(k)u(k)]} (D.13) 
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Setting equal to zero the partial derivatives of I 1t with respect to u(k), 

n 9 1 

!(k), (for ^o,...^); x(k) (for k= o»..etn+l); and 0, yields the follow- 



ing relations; 

0 = A» 0 (D.1 4) 

£ = H' (o)je(o) + F» (o)i(o) (D .15) 

u = T' (k)V'f(k) + T* (k)W' A(k) (D.1 6 ) 

A(k-1 ) = F» (k)A(k) + H* (k)i(k) k=1,...,n (D.17) 

A^(n) = o (D.1 8) 

x(o|n) = m + A© (D.1 9) 

x(k+1 |n) = F(k)x(kln) + WT(k)u(k) k=o,...,n (D.20) 

z(k) = H(k)x(k|n) + VT(k)u(k) (D.21 ) 

Combining (D.14), (D.1 5) and (D.19) and observing the AA' = P(o) yields 

x(o|n) = m + P(o)H ! (o)^(o) + P(o)F s (o)A(o) (D.22) 

Combining (D.1 6 ) and (D.20) and using (D.9) and (D.11), we obtain the 
following equation; 

x(k+1 In) = F(k)x(k In) + G(k)S(k)j>(k) + G(k)£(k)G» (k)A(k) (D.23) 



Using (D.10), (D„11) and (D.l6) s we rewrite (D.21) in the following form; 

z(k) = H(k)x(k|n) + R(k)9(k) + S' 1 (k)G» (k)A(k) (D.24) 

We now have a two-point boundary value problem given by (D.23) and 
(D.1?) and the constraint (D.24). The boundary conditions are given by 
(D.1 8) and (D.22). 

- 1 ? 0 - 



In order to simplify future equations we define the following 
quantities ; 

B(k) = [H(k)P(k)H'(k) + R(k)] # (D.25) 

M(k) = B(k) [ H(k)P(k)F» (k) +S'(k)G'(k)] (D.26) 

We begin by considering the boundary condition for x(oln). Com- 
bining (D„ 22) and (D.24) yields 

z(o) = H(o)ra + [ H(o)P(o)H* (o) +R(o)] £(o) 

+ [H(o)P(o)F ! (o) + S'(o)G'(o)] A(o) (D 0 27 ) 

Using the generalized inverse to solve for a £(o) yields 

£(o) = B(o) [ z.(o)-H(o)m ] - M(o)X(o) (D.28) 

Note that any pseudo-inverse could have been used to solve for £_(o)„ 

The generalized inverse is convenient because it allows us to define 
B(k) uniquely. Combining (D.28) and (D.22) yields 

x(o|n) = m + P(o)H* (o)B(o) [ z,(o)-H(o)m ] 

+ [ P(o)F»(o)-P(o)H'(o)M(o)] \(o) (D.29) 

or, equivalently, 

x(oln) = x(o I o) + Lp(o)F" ( o)-P(o)H* ( o)M(o)] _\(o) (D.30) 

It is reassuring to note that for the special case S = 0, (D.30) reduces 
to the following familiar equation of Chapter II; 

x(o|n) = x(olo) + C(o)F ! (o)_)(o) ,S(o) = 0 (2.41) 
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We now hypothesize that the general solution is of the following 

form; 

x(k|n) = x(klk) + fP(k)F'(k) - P(k)H« (k)M(k) 1 \(k) ,k=0,oo.»n (D.31 ) 

A(k-1) = [ F* (k)-H' (k)M(k)] A(k) + H« (k)B(k) [ z (k)-H(k)x(k Ik-1 )] 

(D.32) 

where x(klk), x(k+ll k) and P(k) are given by (Do3)» (Do5) and (D»6)„ The 
hypothesis (D,32) is equivalent to requiring that £_(k) be given by the 
following equation; 

? (k) = B(k) [ z(k)-H(k)x(k|k-l)] - M(k)\(k) (Dc33) 

We have already shown that (Do 31 ) and (D # 33) are satisfied for 
k=o. To establish a proof by induction we need to show that if (D„31 ) 
and (D»33) are satisfied for some k£ (o, ». » ,n-1 ), then (D 0 31 ) and (D.33) 
are satisfied for k+1 . 

Suppose that (D*31 ) and (Do33) are satisfied for some ke (o,...,n-1 ); 
then from (D 0 23) we have the following relation; 

x (k+1 1 n) = F(k)x(k| k) +[F(k)P(k)F» (k) - F(k)P(k)H» (k)M(k) 

+ G(k)£(k)G»(k) - G(k)S(k)M(k)] X(k) 

+ G(k)S(k)B(k) [ z(k) - H(k)x(klk~1)] (D.34) 

Using (D,3)» (Do5) and (D 0 6) ( we may rewrite (D e 3^0 in the follow- 
ing form; 

x(k+1 In) = x(k+1lk) + P(k+1 ) \_(k) (D # 35) 

Substituting for _X(k) from (D 0 17) yields 
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x(k+1 1 n) = x(k+1 1 k) + P(k+1 )H ' (k+1 )£(k+1 ) + P(k+1 )F' (k+1 )^(k+1 ) (D.36) 
From (D.24) we may write 

z (k+1 ) = H(k+1 )x(k+1 1 k) + [ H(k+1 )P(k+1 )H' (k+1 ) + R(k+1 )] £(k+1 ) 

+ [ H(k+1 )P(k+1 )F> (k+1 ) + S' (k+1 )G' (k+1 )J \(k+1 ) (D.3 7) 

Using the generalized inverse to solve for a £(k+1 ) we obtain the 
following relation which satisfies the hypothesis, (D„37)» 

£(k+1 ) = B(k+1 ) [ 2 (k+1 )-H(k+1 )x(k+1 1 k)] - M(k+1 ) X(k+1 ) 

Combining (D„38) and (D„36) and using (D 0 3)» we obtain the following 
equation which satisfies the hypothesis (D.31 ) and completes the proof; 

x(k+1 1 n) = x(k+1 I k+1 ) + [ P(k+1 )F> (k+1 )-P(k+1 )H' (k+1 )M(k+1 )] A(k+1 ) 

(D.39) 

We may summarize these results in the following theorem; 

Theorem ; The general solution of the linear estimation problem as 
specified by the two-point boundary value problem (D.23), (D„17), (D„24) „ 

(D,, 18) and (D.22) is given by the following recurrence relations; 

x(k+1lk) = F(k)x(klk->1 ) + M* (k) [ z(k)-H(k)x(k|k-1 )] (D o 40) 

P(k+1 ) = F(k)P(k)F ! (k) + G(k)£(k)G> (k)-M> (k)B # (k)M(k) (D.41 ) 

x(k|k) =x(k|k-1) + P(k)H' (k)B(k) [ z(k)-H(k)x(klk-1 )] (D.42) 

x(k|n) = x(k|k) + [ P(k)F» (k)-P(k)H* (k)M(k)] \(k) k=o,...,n (D.43) 

A(k-1) =[F , (k)-H , (k)M(k)] \.(k) + H* (k)B(k) [ z(k)-H(k)x(klk-1 )] (D.44) 

k-1 , o o o , n 
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(D.45) 



where B(k) and M(k) are defined in (D.25) and (D,26), The recursion 
begins by replacing x(k|k-1) for k=o by m in (D,4l ) and (D.42), Pre- 
dictions are extrapolations of x(n+lln); for example, 

x(n+2ln) = F(n+1 )x(n+1 |n) 

x(n+3ln) = F(n+2)x(n+2|n) 

For the special case S(k) = 0 for all k (D,43) becomes 
x(k|n) = x(klk) + C(k)F' (k),\(k) (D,46) 

where C(k) is given by (D«4),and (D.44) beomes 

^.(k-1) ={ I-H'(k) [ H(k)P(k)H'(k) + R(k)] # H(k)P(k) j F' (k)_\(k) 

+ H»(k) [ H(k)P(k)H» (k) + R(k)J # [ z(k)-H(k)x(k| k-1 )] 

(D.47) 

Discussion ; Equations (D.4 q), (D.41 ) and (D.42) are simply other forms 
of (D.5), (D.6) and (D.3) respectively. The generalized inverse may be 
replaced by any pseudo-inverse if (D,6) is used in place of (D.41). 

The calculation procedure is to use (D.40), (D.41 ) and (D,42) to 
obtain x(k+1|k) and x(klk) for k=o,„,.,n. Then calculate A,(k) by back- 
ward recursion of (D.44). Once A_(k) is obtained, (D.43) may be used to 
obtain x(k|n) for all k € (o,,,,,n). 

If the measurement noise is not independent, i.e., E[v(j)v' (k)] ^ ^^(k) 
we may consider this noise to be the output of a linear system excited 
by white noise and adjoin the states of this fictitious linear system to 
the basic process. There is no need to assume that F(k) is non-singular. 

We can express the solution of the linear smoothing problem in a 
simpler form by writing the expressions in terms of x(k|k-1). Com- 
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bining (DA2) and (D 0 ^3) yields 



x(kln) = x(klk-l) + P(k)H l (k)B(}c) [ z(k)-H(k)x(k I k— 1 )] 

+ P(k) [ F»(k)-H'(k)M(k)] A(k) 

This expression may be combined with (D«44) to obtain the following 
equation; which is a restatement of (D„35) ; 

x(kln) =x(klk-1) + P(k)2_(k~1 ) 

We may summarize this result in the following corollary; 

Corollary ; The solution of the linear smoothing problem is given by 
(D 8 48)„ (Do^ 0) } (D„4l ) and (D 0 44) 0 (D.44) is now valid for 11=0,...,^ 

The boundary condition ^(n) = o remains in effect* The a priori mean 
m is to be used for x(o|-1 ) in these expressions. 



(D.48) 
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APPENDIX E 



PROBABILITY DENSITY FUNCTIONALS FOR GAUSSIAN RANDOM PROCESSES 

In this appendix we shall given an introductory discussion of 
probability density functionals. Our treatment will be formal, since 
a rigorous discussion of "white noise" and probability density functionals 
is beyond the scope of this report. The concept of a probability density 
functional was introduced by Preston [ 53 J and was extended by Thomas and 
Zadeh [62] to include vector Gaussian processes. For a more abstract 
treatment the reader is referred to the work of Parzen [ 72 ]. 

Consider a vector random process [x(t)]» for o - t - T, consisting 
of a sequence of steps occurring at intervals T/N apart. Assume that the 
magnitude of the steps is normally distributed with zero mean and the 

I 

following correlation (covariance) matrix; 

E [x(iT/N)±»(jT/N)] = R(iT/N,jT/N) i,j € (1 , . . .N) (E.1 ) 

A sample function of a scalar version of such a process is shown in Fig. 

B-1. 
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The probability density function for the sequence of points 



{x(T/N) # x(2T/K),....,^x(T)] is 
P [x(T/N),...,x(T)] = C' 1 exp L \ 



4 4 x* (iT/N)H(iT/N t jT/N)x(jT/N) > 

i=1 ^ 

(B.2) 



where C is a constant of proportionality and where H and R are related 
by the following expression; 




R(iT/N,kT/N)H(kT/N,jT/N) = I 6-. 



(E.3) 



Let us define a function, G(iT/N, jT/n), as follows; 
G(iT/N,jT/N) = H(iT/N,jT/N) [n/t] 2 ; 



(E.4) 



Then (E.2) may be rewritten in the following form; 



p [x(T/N),...,x(T)] = C~' exp 

We may also rewrite (E.3) as 




2 x»(iT/N)G(iT/N f jT/N)x(jT/N) [t/m] 
3 = 1 

(E.5) 



N 

V R(iT/N,kT/N)G(kT/N,jT/N) [t/n] = I £ . , [n/t] 

Pi J (E.6) 



. If we take the limit as the resulting process {x(t)| is a 

Gaussian random process. A random process jx(t)j is said to be Gaussian 
if the joint distribution for every finite set of points j x(t^ ) , . . . ,x(t n ) j 
is normal. 

In the limit as , (E.6) becomes 
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J R(t 1 ,t)G(t,t 2 )dt = I 2 ), (o - t 1 ,t 2 - T) 

o 



(E.7) 



A matrix kernel G(t,t 2 ) satisfying (E.7) is said to be inverse to the 
correlation matrix R(t 1 ,t). 

If we were to take the limit of (E.5) as N~ , the constant 
of proportionality C would become infinite. To avoid this difficulty 
we define the probability density functional as a ratio, using a definition 
similar to that given by Thomas and Zadeh [ 62 ] . Let z r T1 be a vector of 
curves. Let h be a small positive scalar and let h be a vector, each 
element of which is equal to h. Then, the probability density functional 
of the process jx(t)j on the interval [o,T] is defined as follows; 



Px 




lim Pr [ z(t)-h <x(t) - z(t)-%, for o - t - Tj 

h — - > o — — — — — = 

Pr [-h < x(t) ^ h, for 0 - t - T~] (E.8) 




z' ^)S{t v t 2 )z{t 2 ) dt 1 dt 2 



(E.9) 



Note the similarity between (E.5) and (E.9)« 

White Gaussain noise is defined to be the formal limit of a 
Gaussian process as the values of ( x(t) } at different instants of time 
become statistically independent. In this case R(t^ ,t 2 ) becomes 
R(t.| ) <£ (t^=t 2 ) and G(t 1 ,t 2 ) becomes R (t^ ) <^(t.|-t 2 ). Then (E.9) becomes 



Px 





z^tjR" 1 (t)z(t) dt 



(E.1 0) 



= exp 



i. 

2 

o 




2 

r 1 (t) 




(E.1 1 ) 
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Probability density functionals are a powerful tool when used for 
problems involving Gaussian random processes on a finite time interval 
and will be used in Chapter III of this report. 
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